Searched hist:0 (Results 226 - 250 of 40959) sorted by relevance

1234567891011>>

/linux-master/Documentation/netlink/specs/
H A Dovs_datapath.yaml643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
643ef4a6 Mon Mar 27 02:31:36 MDT 2023 Donald Hunter <donald.hunter@gmail.com> netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'test',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': {'dispatch-upcall-per-cpu',
'tc-recirc-sharing',
'unaligned'}},
{'dp-ifindex': 48,
'masks-cache-size': 256,
'megaflow-stats': {'cache-hits': 0,
'mask-hit': 0,
'masks': 0,
'pad1': 0,
'padding': 0},
'name': 'demo',
'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
'user-features': set()}]

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_datapath.yaml \
--do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ovs_vport.yaml \
--dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
'ifindex': 3,
'name': 'test',
'port-no': 0,
'stats': {'rx-bytes': 0,
'rx-dropped': 0,
'rx-errors': 0,
'rx-packets': 0,
'tx-bytes': 0,
'tx-dropped': 0,
'tx-errors': 0,
'tx-packets': 0},
'type': 'internal',
'upcall-pid': [0],
'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
/linux-master/arch/xtensa/boot/boot-elf/
H A Dbootstrap.Sdiff 0c692569 Mon Aug 13 21:43:10 MDT 2018 Max Filippov <jcmvbkbc@gmail.com> xtensa: clean up boot-elf/bootstrap.S

Drop unneeded headers, rewrite literal definitions with .literal.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
diff a9f2fc62 Tue Apr 12 20:20:02 MDT 2016 Max Filippov <jcmvbkbc@gmail.com> xtensa: cleanup MMU setup and kernel layout macros

Make kernel load address explicit, independent of the selected MMU
configuration and configurable from Kconfig. Do not restrict it to the
first 512MB of the physical address space.

Cleanup kernel memory layout macros:

- rename VECBASE_RESET_VADDR to VECBASE_VADDR, XC_VADDR to VECTOR_VADDR;
- drop VIRTUAL_MEMORY_ADDRESS and LOAD_MEMORY_ADDRESS;
- introduce PHYS_OFFSET and use it in __va and __pa definitions;
- synchronize MMU/noMMU vectors, drop unused NMI vector;
- replace hardcoded vectors offset of 0x3000 with Kconfig symbol.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
diff 0d848afe Fri Oct 16 09:30:37 MDT 2015 Max Filippov <jcmvbkbc@gmail.com> xtensa: drop unused sections and remapped reset handlers

There are no .bootstrap or .ResetVector.text sections linked to the
vmlinux image, drop these sections from vmlinux.ld.S. Drop
RESET_VECTOR_VADDR definition only used for .ResetVector.text.

Drop remapped copies of primary and secondary reset vectors, as modern
gdb don't have problems stepping through instructions at arbitrary
locations. Drop corresponding sections from the corresponding linker
scripts.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
diff ccd0ef38 Thu Oct 02 12:03:27 MDT 2014 Max Filippov <jcmvbkbc@gmail.com> xtensa: nommu: fix Image.elf reset code and ld script

Don't hardcode kernel entry address as 0x3000 or 0xd0003000, use
LOAD_MEMORY_ADDRESS macro. Don't compile MMU remapping code and don't try
to link it when building noMMU configuration.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
diff ccd0ef38 Thu Oct 02 12:03:27 MDT 2014 Max Filippov <jcmvbkbc@gmail.com> xtensa: nommu: fix Image.elf reset code and ld script

Don't hardcode kernel entry address as 0x3000 or 0xd0003000, use
LOAD_MEMORY_ADDRESS macro. Don't compile MMU remapping code and don't try
to link it when building noMMU configuration.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
H A Dboot.lds.Sdiff a9f2fc62 Tue Apr 12 20:20:02 MDT 2016 Max Filippov <jcmvbkbc@gmail.com> xtensa: cleanup MMU setup and kernel layout macros

Make kernel load address explicit, independent of the selected MMU
configuration and configurable from Kconfig. Do not restrict it to the
first 512MB of the physical address space.

Cleanup kernel memory layout macros:

- rename VECBASE_RESET_VADDR to VECBASE_VADDR, XC_VADDR to VECTOR_VADDR;
- drop VIRTUAL_MEMORY_ADDRESS and LOAD_MEMORY_ADDRESS;
- introduce PHYS_OFFSET and use it in __va and __pa definitions;
- synchronize MMU/noMMU vectors, drop unused NMI vector;
- replace hardcoded vectors offset of 0x3000 with Kconfig symbol.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
diff 0d848afe Fri Oct 16 09:30:37 MDT 2015 Max Filippov <jcmvbkbc@gmail.com> xtensa: drop unused sections and remapped reset handlers

There are no .bootstrap or .ResetVector.text sections linked to the
vmlinux image, drop these sections from vmlinux.ld.S. Drop
RESET_VECTOR_VADDR definition only used for .ResetVector.text.

Drop remapped copies of primary and secondary reset vectors, as modern
gdb don't have problems stepping through instructions at arbitrary
locations. Drop corresponding sections from the corresponding linker
scripts.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
diff ccd0ef38 Thu Oct 02 12:03:27 MDT 2014 Max Filippov <jcmvbkbc@gmail.com> xtensa: nommu: fix Image.elf reset code and ld script

Don't hardcode kernel entry address as 0x3000 or 0xd0003000, use
LOAD_MEMORY_ADDRESS macro. Don't compile MMU remapping code and don't try
to link it when building noMMU configuration.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
diff ccd0ef38 Thu Oct 02 12:03:27 MDT 2014 Max Filippov <jcmvbkbc@gmail.com> xtensa: nommu: fix Image.elf reset code and ld script

Don't hardcode kernel entry address as 0x3000 or 0xd0003000, use
LOAD_MEMORY_ADDRESS macro. Don't compile MMU remapping code and don't try
to link it when building noMMU configuration.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
diff e85e335f Mon Dec 03 04:01:43 MST 2012 Max Filippov <jcmvbkbc@gmail.com> xtensa: add MMU v3 support

MMUv3 comes out of reset with identity vaddr -> paddr mapping in the TLB
way 6:

Way 6 (512 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0x00000000 0x00000000 0x01 0x03 RWX Bypass
0x20000000 0x20000000 0x01 0x03 RWX Bypass
0x40000000 0x40000000 0x01 0x03 RWX Bypass
0x60000000 0x60000000 0x01 0x03 RWX Bypass
0x80000000 0x80000000 0x01 0x03 RWX Bypass
0xa0000000 0xa0000000 0x01 0x03 RWX Bypass
0xc0000000 0xc0000000 0x01 0x03 RWX Bypass
0xe0000000 0xe0000000 0x01 0x03 RWX Bypass

This patch adds remapping code at the reset vector or at the kernel
_start (depending on CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX) that
reconfigures MMUv3 as MMUv2:

Way 5 (128 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xd0000000 0x00000000 0x01 0x07 RWX WB
0xd8000000 0x00000000 0x01 0x03 RWX Bypass
Way 6 (256 MB)
Vaddr Paddr ASID Attr RWX Cache
---------- ---------- ---- ---- --- -------
0xe0000000 0xf0000000 0x01 0x07 RWX WB
0xf0000000 0xf0000000 0x01 0x03 RWX Bypass

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
/linux-master/tools/testing/selftests/bpf/prog_tests/
H A Dlsm_cgroup.cdiff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
/linux-master/include/linux/usb/
H A Dgadget_configfs.h88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
88af8bbe Sun Dec 23 13:10:24 MST 2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de> usb: gadget: the start of the configfs interface

|# modprobe dummy_hcd num=2
|# modprobe libcomposite

|# lsmod
|Module Size Used by
|libcomposite 31648 0
|dummy_hcd 19871 0

|# mkdir /sys/kernel/config/usb_gadget/oha
|# cd /sys/kernel/config/usb_gadget/oha
|# mkdir configs/def.1
|# mkdir configs/def.2
|# mkdir functions/acm.ttyS1
|# mkdir strings/0x1
|mkdir: cannot create directory `strings/0x1': Invalid argument
|# mkdir strings/0x409
|# mkdir strings/1033
|mkdir: cannot create directory `strings/1033': File exists
|# mkdir strings/1032
|# mkdir configs/def.1/strings/0x409
|# mkdir configs/def.2/strings/0x409

|#find . -ls
| 975 0 drwxr-xr-x 5 root root 0 Dec 23 17:40 .
| 978 0 drwxr-xr-x 4 root root 0 Dec 23 17:43 ./strings
| 4100 0 drwxr-xr-x 2 root root 0 Dec 23 17:43 ./strings/1032
| 995 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/serialnumber
| 996 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/product
| 997 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/1032/manufacturer
| 2002 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./strings/0x409
| 998 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/serialnumber
| 999 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/product
| 1000 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./strings/0x409/manufacturer
| 977 0 drwxr-xr-x 4 root root 0 Dec 23 17:41 ./configs
| 4081 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./configs/def.2
| 4082 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.2/strings
| 2016 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.2/strings/0x409
| 1001 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/strings/0x409/configuration
| 1002 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/bmAttributes
| 1003 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.2/MaxPower
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 ./configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:42 ./configs/def.1/strings/0x409
| 1004 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/strings/0x409/configuration
| 1005 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/bmAttributes
| 1006 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./configs/def.1/MaxPower
| 976 0 drwxr-xr-x 3 root root 0 Dec 23 17:41 ./functions
| 981 0 drwxr-xr-x 2 root root 0 Dec 23 17:41 ./functions/acm.ttyS1
| 1007 0 -r--r--r-- 1 root root 4096 Dec 23 17:43 ./functions/acm.ttyS1/port_num
| 1008 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./UDC
| 1009 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdUSB
| 1010 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bcdDevice
| 1011 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idProduct
| 1012 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./idVendor
| 1013 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bMaxPacketSize0
| 1014 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceProtocol
| 1015 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceSubClass
| 1016 0 -rw-r--r-- 1 root root 4096 Dec 23 17:43 ./bDeviceClass

|# cat functions/acm.ttyS1/port_num
|0
|# ls -lah /dev/ttyGS*
|crw-rw---T 1 root dialout 252, 0 Dec 23 17:41 /dev/ttyGS0
|
|# echo 0x1234 > idProduct
|# echo 0xabcd > idVendor
|# echo 1122 > strings/0x409/serialnumber
|# echo "The manufacturer" > strings/0x409/manufacturer
|# echo 1 > strings/1032/manufacturer
|# echo 1sa > strings/1032/product
|# echo tada > strings/1032/serialnumber
|echo "Primary configuration" > configs/def.1/strings/0x409/configuration
|# echo "Secondary configuration" > configs/def.2/strings/0x409/configuration
|# ln -s functions/acm.ttyS1 configs/def.1/
|# ln -s functions/acm.ttyS1 configs/def.2/
|find configs/def.1/ -ls
| 979 0 drwxr-xr-x 3 root root 0 Dec 23 17:49 configs/def.1/
| 6264 0 lrwxrwxrwx 1 root root 0 Dec 23 17:48 configs/def.1/acm.ttyS1 -> ../../../../usb_gadget/oha/functions/acm.ttyS1
| 980 0 drwxr-xr-x 3 root root 0 Dec 23 17:42 configs/def.1/strings
| 5122 0 drwxr-xr-x 2 root root 0 Dec 23 17:49 configs/def.1/strings/0x409
| 6284 0 -rw-r--r-- 1 root root 4096 Dec 23 17:47 configs/def.1/strings/0x409/configuration
| 6285 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/bmAttributes
| 6286 0 -rw-r--r-- 1 root root 4096 Dec 23 17:49 configs/def.1/MaxPower
|
|echo 120 > configs/def.1/MaxPower
|
|# ls -lh /sys/class/udc/
|total 0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.0 -> ../../devices/platform/dummy_udc.0/udc/dummy_udc.0
|lrwxrwxrwx 1 root root 0 Dec 23 17:50 dummy_udc.1 -> ../../devices/platform/dummy_udc.1/udc/dummy_udc.1
|# echo dummy_udc.0 > UDC
|# lsusb
|Bus 001 Device 002: ID abcd:1234 Unknown
|
|lsusb -d abcd:1234 -v
|Device Descriptor:
…
| idVendor 0xabcd Unknown
| idProduct 0x1234
| bcdDevice 3.06
| iManufacturer 1 The manufacturer
| iProduct 2
| iSerial 3 1122
| bNumConfigurations 2
…
|echo "" > UDC

v5…v6
- wired up strings with usb_gstrings_attach()
- add UDC attribe. Write "udc-name" will bind the gadget. Write an empty
string (it should contain \n since 0 bytes write get optimzed away)
will unbind the UDC from the gadget. The name of available UDCs can be
obtained from /sys/class/udc/

v4…v5
- string rework. This will add a strings folder incl. language code like
strings/409/manufacturer
as suggested by Alan.
- rebased ontop reworked functions.c which has usb_function_instance
which is used prior after "mkdir acm.instance" and can be directly
used for configuration via configfs.

v3…v4
- moved functions from the root folde down to the gadget as suggested
by Michał
- configs have now their own configs folder as suggested by Michał.
The folder is still name.bConfigurationValue where name becomes the
sConfiguration. Is this usefull should we just stilc
configs/bConfigurationValue/ ?
- added configfs support to the ACM function. The port_num attribute is
exported by f_acm. An argument has been added to the USB alloc
function to distinguish between "old" (use facm_configure() to
configure and configfs interface (expose a config_node).
The port_num is currently a dumb counter. It will
require some function re-work to make it work.

scheduled for v5:
- sym linking function into config.

v2…v3
- replaced one ifndef by ifdef as suggested by Micahał
- strstr()/strchr() function_make as suggested by Micahł
- replace [iSerialNumber|iProduct|iManufacturer] with
[sSerialNumber|sProduct|sManufacturer] as suggested by Alan
- added creation of config descriptors

v1…v2
- moved gadgets from configfs' root directory into /udcs/ within our
"usb_gadget" folder. Requested by Andrzej & Michał
- use a dot as a delimiter between function's name and its instance's name
as suggested by Michał
- renamed all config_item_type, configfs_group_operations, make_group,
drop_item as suggested by suggested by Andrzej to remain consisten
within this file and within other configfs users
- Since configfs.c and functions.c are now part of the udc-core module,
the module itself is now called udc. Also added a tiny ifdef around
init code becuase udc-core is subsys init and this is too early for
configfs in the built-in case. In the module case, we can only have
one init function.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
/linux-master/drivers/staging/rtl8723bs/core/
H A Drtw_sta_mgt.cdiff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8f434708 Wed Mar 02 03:16:36 MST 2022 Hans de Goede <hdegoede@redhat.com> staging: rtl8723bs: Fix access-point mode deadlock

Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.

[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:

[ +0.095950] Possible unsafe locking scenario:

[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***

[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
/linux-master/net/ipv4/
H A Dtcp_cdg.cdiff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 72e560cb Tue Oct 11 16:07:48 MDT 2022 Eric Dumazet <edumazet@google.com> tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
/linux-master/tools/perf/trace/beauty/
H A Dclone.cdiff 2316ef58 Tue Mar 19 11:49:32 MDT 2024 Arnaldo Carvalho de Melo <acme@redhat.com> perf beauty: Introduce scrape script for 'clone' syscall 'flags' argument

It was using the first variation on producing a string representation
for a binary flag, one that used the copy of uapi/linux/sched.h with
preprocessor tricks that had to be updated everytime a new flag was
introduced.

Use the more recent scrape script + strarray + strarray__scnprintf_flags() combo.

$ tools/perf/trace/beauty/clone.sh | head -5
static const char *clone_flags[] = {
[ilog2(0x00000100) + 1] = "VM",
[ilog2(0x00000200) + 1] = "FS",
[ilog2(0x00000400) + 1] = "FILES",
[ilog2(0x00000800) + 1] = "SIGHAND",
$

Now we can move uapi/linux/sched.h from tools/include/, that is used for
building perf to the scrape only directory tools/perf/trace/beauty/include.

Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZfnULIn3XKDq0bpc@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff 2316ef58 Tue Mar 19 11:49:32 MDT 2024 Arnaldo Carvalho de Melo <acme@redhat.com> perf beauty: Introduce scrape script for 'clone' syscall 'flags' argument

It was using the first variation on producing a string representation
for a binary flag, one that used the copy of uapi/linux/sched.h with
preprocessor tricks that had to be updated everytime a new flag was
introduced.

Use the more recent scrape script + strarray + strarray__scnprintf_flags() combo.

$ tools/perf/trace/beauty/clone.sh | head -5
static const char *clone_flags[] = {
[ilog2(0x00000100) + 1] = "VM",
[ilog2(0x00000200) + 1] = "FS",
[ilog2(0x00000400) + 1] = "FILES",
[ilog2(0x00000800) + 1] = "SIGHAND",
$

Now we can move uapi/linux/sched.h from tools/include/, that is used for
building perf to the scrape only directory tools/perf/trace/beauty/include.

Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZfnULIn3XKDq0bpc@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff 2316ef58 Tue Mar 19 11:49:32 MDT 2024 Arnaldo Carvalho de Melo <acme@redhat.com> perf beauty: Introduce scrape script for 'clone' syscall 'flags' argument

It was using the first variation on producing a string representation
for a binary flag, one that used the copy of uapi/linux/sched.h with
preprocessor tricks that had to be updated everytime a new flag was
introduced.

Use the more recent scrape script + strarray + strarray__scnprintf_flags() combo.

$ tools/perf/trace/beauty/clone.sh | head -5
static const char *clone_flags[] = {
[ilog2(0x00000100) + 1] = "VM",
[ilog2(0x00000200) + 1] = "FS",
[ilog2(0x00000400) + 1] = "FILES",
[ilog2(0x00000800) + 1] = "SIGHAND",
$

Now we can move uapi/linux/sched.h from tools/include/, that is used for
building perf to the scrape only directory tools/perf/trace/beauty/include.

Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZfnULIn3XKDq0bpc@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff 2316ef58 Tue Mar 19 11:49:32 MDT 2024 Arnaldo Carvalho de Melo <acme@redhat.com> perf beauty: Introduce scrape script for 'clone' syscall 'flags' argument

It was using the first variation on producing a string representation
for a binary flag, one that used the copy of uapi/linux/sched.h with
preprocessor tricks that had to be updated everytime a new flag was
introduced.

Use the more recent scrape script + strarray + strarray__scnprintf_flags() combo.

$ tools/perf/trace/beauty/clone.sh | head -5
static const char *clone_flags[] = {
[ilog2(0x00000100) + 1] = "VM",
[ilog2(0x00000200) + 1] = "FS",
[ilog2(0x00000400) + 1] = "FILES",
[ilog2(0x00000800) + 1] = "SIGHAND",
$

Now we can move uapi/linux/sched.h from tools/include/, that is used for
building perf to the scrape only directory tools/perf/trace/beauty/include.

Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZfnULIn3XKDq0bpc@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff c65c83ff Fri Dec 14 13:06:47 MST 2018 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Allow asking for not suppressing common string prefixes

So far we've been suppressing common stuff such as "MAP_" in the mmap
flags, showing "SHARED" instead of "MAP_SHARED", allow for those
prefixes (and a few suffixes) to be shown:

# trace -e *map,open*,*seek sleep 1
openat("/etc/ld.so.cache", CLOEXEC) = 3
mmap(0, 109093, READ, PRIVATE, 3, 0) = 0x7ff61c695000
openat("/lib64/libc.so.6", CLOEXEC) = 3
lseek(3, 792, SET) = 792
mmap(0, 8192, READ|WRITE, PRIVATE|ANONYMOUS) = 0x7ff61c693000
lseek(3, 792, SET) = 792
lseek(3, 864, SET) = 864
mmap(0, 1857568, READ, PRIVATE|DENYWRITE, 3, 0) = 0x7ff61c4cd000
mmap(0x7ff61c4ef000, 1363968, EXEC|READ, PRIVATE|FIXED|DENYWRITE, 3, 139264) = 0x7ff61c4ef000
mmap(0x7ff61c63c000, 311296, READ, PRIVATE|FIXED|DENYWRITE, 3, 1503232) = 0x7ff61c63c000
mmap(0x7ff61c689000, 24576, READ|WRITE, PRIVATE|FIXED|DENYWRITE, 3, 1814528) = 0x7ff61c689000
mmap(0x7ff61c68f000, 14368, READ|WRITE, PRIVATE|FIXED|ANONYMOUS) = 0x7ff61c68f000
munmap(0x7ff61c695000, 109093) = 0
openat("/usr/lib/locale/locale-archive", CLOEXEC) = 3
mmap(0, 217749968, READ, PRIVATE, 3, 0) = 0x7ff60f523000
#
# vim ~/.perfconfig
#
# perf config
llvm.dump-obj=true
trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
trace.show_zeros=yes
trace.show_duration=no
trace.no_inherit=yes
trace.show_timestamp=no
trace.show_arg_names=no
trace.args_alignment=0
trace.string_quote="
trace.show_prefix=yes
#
#
# trace -e *map,open*,*seek sleep 1
openat(AT_FDCWD, "/etc/ld.so.cache", O_CLOEXEC) = 3
mmap(0, 109093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7ebbe59000
openat(AT_FDCWD, "/lib64/libc.so.6", O_CLOEXEC) = 3
lseek(3, 792, SEEK_SET) = 792
mmap(0, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f7ebbe57000
lseek(3, 792, SEEK_SET) = 792
lseek(3, 864, SEEK_SET) = 864
mmap(0, 1857568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7ebbc91000
mmap(0x7f7ebbcb3000, 1363968, PROT_EXEC|PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 139264) = 0x7f7ebbcb3000
mmap(0x7f7ebbe00000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1503232) = 0x7f7ebbe00000
mmap(0x7f7ebbe4d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 1814528) = 0x7f7ebbe4d000
mmap(0x7f7ebbe53000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f7ebbe53000
munmap(0x7f7ebbe59000, 109093) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_CLOEXEC) = 3
mmap(0, 217749968, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7eaece7000
#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mtn1i4rjowjl72trtnbmvjd4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
/linux-master/net/wireless/
H A Dsysfs.cdiff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff b3ef5520 Tue Mar 28 02:11:31 MDT 2017 Arend Van Spriel <arend.vanspriel@broadcom.com> cfg80211: check rdev resume callback only for registered wiphy

We got the following use-after-free KASAN report:

BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.

Cc: stable@vger.kernel.org # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
/linux-master/arch/parisc/mm/
H A Dfixmap.cdiff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
diff c2ff2b73 Thu Aug 03 00:24:04 MDT 2023 Mike Rapoport (IBM) <rppt@kernel.org> parisc/mm: preallocate fixmap page tables at init

Christoph Biedl reported early OOM on recent kernels:

swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4

Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600

This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().

Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
/linux-master/drivers/net/ethernet/mellanox/mlx5/core/en/tc/
H A Dpost_act.hdiff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff e9fce818 Wed Mar 22 02:52:26 MDT 2023 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Don't clone flow post action attributes second time

The code already clones post action attributes in
mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
mlx5e_tc_post_act_add() is a erroneous leftover from original
implementation. Instead, assign handle->attribute to post_attr provided by
the caller. Note that cloning the attribute second time is not just
wasteful but also causes issues like second copy not being properly updated
in neigh update code which leads to following use-after-free:

Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK>
Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d
Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200
Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
--
Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30
Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40
Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0
Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
/linux-master/drivers/net/wireless/ath/ath10k/
H A Dahb.cdiff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
diff e2f8b74e Mon Dec 14 23:35:04 MST 2020 Wen Gong <wgong@codeaurora.org> ath10k: prevent deinitializing NAPI twice

It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
test simulate crash and ifconfig down/rmmod meanwhile.

Test steps:

1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci

2. dmesg:
[ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
[ 5622.655995] ieee80211 phy0: Hardware restart was requested
[ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
[ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
[ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
[ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137
[ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
[ 5776.359855] Call trace:
[ 5776.359868] dump_backtrace+0x0/0x170
[ 5776.359881] show_stack+0x20/0x2c
[ 5776.359896] dump_stack+0xd4/0x10c
[ 5776.359916] panic+0x12c/0x29c
[ 5776.359937] hung_task_panic+0x0/0x50
[ 5776.359953] kthread+0x120/0x130
[ 5776.359965] ret_from_fork+0x10/0x18
[ 5776.359986] SMP: stopping secondary CPUs
[ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
[ 5776.360026] CPU features: 0x0,2188200c
[ 5776.360035] Memory Limit: none

command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
callstack of ifconfig:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] __dev_change_flags+0xe0/0x1d0
[<0>] dev_change_flags+0x30/0x6c
[<0>] devinet_ioctl+0x370/0x564
[<0>] inet_ioctl+0xdc/0x304
[<0>] sock_do_ioctl+0x50/0x288
[<0>] compat_sock_ioctl+0x1b4/0x1aac
[<0>] __se_compat_sys_ioctl+0x100/0x26fc
[<0>] __arm64_compat_sys_ioctl+0x20/0x2c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

callstack of rmmod:
[<0>] __switch_to+0x120/0x13c
[<0>] msleep+0x28/0x38
[<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
[<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
[<0>] ath10k_halt+0x120/0x178 [ath10k_core]
[<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
[<0>] drv_stop+0xe0/0x1e4 [mac80211]
[<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
[<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
[<0>] ieee80211_stop+0x20/0x30 [mac80211]
[<0>] __dev_close_many+0xb8/0x11c
[<0>] dev_close_many+0x70/0x100
[<0>] dev_close+0x4c/0x80
[<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
[<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
[<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
[<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
[<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
[<0>] sdio_bus_remove+0x48/0x108
[<0>] device_release_driver_internal+0x138/0x1ec
[<0>] driver_detach+0x6c/0xa8
[<0>] bus_remove_driver+0x78/0xa8
[<0>] driver_unregister+0x30/0x50
[<0>] sdio_unregister_driver+0x28/0x34
[<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
[<0>] __arm64_sys_delete_module+0x1e0/0x22c
[<0>] el0_svc_common+0xa4/0x154
[<0>] el0_svc_compat_handler+0x2c/0x38
[<0>] el0_svc_compat+0x8/0x18
[<0>] 0xffffffffffffffff

SNOC:
[ 647.156863] Call trace:
[ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
[ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
[ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
[ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
[ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
[ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38
[ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
[ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
[ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
[ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
[ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
[ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
[ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
[ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
[ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
[ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
[ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
[ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
[ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
[ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
[ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
[ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
[ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
[ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
[ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
[ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
[ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
[ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
[ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
[ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
[ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c

PCIe:
[ 615.392770] rmmod D 0 3523 3458 0x00000080
[ 615.392777] Call Trace:
[ 615.392784] __schedule+0x617/0x7d3
[ 615.392791] ? __mod_timer+0x263/0x35c
[ 615.392797] schedule+0x62/0x72
[ 615.392803] schedule_timeout+0x8d/0xf3
[ 615.392809] ? run_local_timers+0x6b/0x6b
[ 615.392814] msleep+0x1b/0x22
[ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
[ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core]
[ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core]
[ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core]
[ 615.392893] drv_stop+0x9a/0x13c [mac80211]
[ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211]
[ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211]
[ 615.392945] __dev_close_many+0x9e/0xf0
[ 615.392952] dev_close_many+0x62/0xe8
[ 615.392958] dev_close+0x54/0x7d
[ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
[ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
[ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211]
[ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core]
[ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core]
[ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
[ 615.393117] pci_device_remove+0x41/0xa6
[ 615.393129] device_release_driver_internal+0x123/0x1ec
[ 615.393140] driver_detach+0x60/0x90
[ 615.393152] bus_remove_driver+0x72/0x9f
[ 615.393164] pci_unregister_driver+0x1e/0x87
[ 615.393177] SyS_delete_module+0x1d7/0x277
[ 615.393188] do_syscall_64+0x6b/0xf7
[ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6

The test command run simulate_fw_crash firstly and it call into
ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
is called and bit NAPI_STATE_SCHED is set. After that, function
ath10k_sdio_hif_stop is called again from ath10k_stop by command
"ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.

It is blocked by napi_synchronize, napi_disable will set bit with
NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
becuase bit NAPI_STATE_SCHED is set by napi_disable.

function of napi_synchronize
static inline void napi_synchronize(const struct napi_struct *n)
{
if (IS_ENABLED(CONFIG_SMP))
while (test_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
else
barrier();
}

function of napi_disable
void napi_disable(struct napi_struct *n)
{
might_sleep();
set_bit(NAPI_STATE_DISABLE, &n->state);

while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
msleep(1);
while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
msleep(1);

hrtimer_cancel(&n->timer);

clear_bit(NAPI_STATE_DISABLE, &n->state);
}

Add flag for it avoid the hang and crash.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1598617348-2325-1-git-send-email-wgong@codeaurora.org
/linux-master/drivers/media/platform/qcom/venus/
H A Dpm_helpers.cdiff 0f6e8d8c Tue Sep 13 00:37:00 MDT 2022 Tang Bin <tangbin@cmss.chinamobile.com> venus: pm_helpers: Fix error check in vcodec_domains_get()

In the function vcodec_domains_get(), dev_pm_domain_attach_by_name()
may return NULL in some cases, so IS_ERR() doesn't meet the
requirements. Thus fix it.

Fixes: 7482a983dea3 ("media: venus: redesign clocks and pm domains control")
Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
diff 1d95af02 Mon Aug 01 09:16:41 MDT 2022 Stanimir Varbanov <stanimir.varbanov@linaro.org> venus: pm_helpers: Fix warning in OPP during probe

Fix the following WARN triggered during Venus driver probe on
5.19.0-rc8-next-20220728:

WARNING: CPU: 7 PID: 339 at drivers/opp/core.c:2471 dev_pm_opp_set_config+0x49c/0x610
Modules linked in: qcom_spmi_adc5 rtc_pm8xxx qcom_spmi_adc_tm5 leds_qcom_lpg led_class_multicolor
qcom_pon qcom_vadc_common venus_core(+) qcom_spmi_temp_alarm v4l2_mem2mem videobuf2_v4l2 msm(+)
videobuf2_common crct10dif_ce spi_geni_qcom snd_soc_sm8250 i2c_qcom_geni gpu_sched
snd_soc_qcom_common videodev qcom_q6v5_pas soundwire_qcom drm_dp_aux_bus qcom_stats
drm_display_helper qcom_pil_info soundwire_bus snd_soc_lpass_va_macro mc qcom_q6v5
phy_qcom_snps_femto_v2 qcom_rng snd_soc_lpass_macro_common snd_soc_lpass_wsa_macro
lpass_gfm_sm8250 slimbus qcom_sysmon qcom_common qcom_glink_smem qmi_helpers
qcom_wdt mdt_loader socinfo icc_osm_l3 display_connector
drm_kms_helper qnoc_sm8250 drm fuse ip_tables x_tables ipv6
CPU: 7 PID: 339 Comm: systemd-udevd Not tainted 5.19.0-rc8-next-20220728 #4
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x49c/0x610
lr : dev_pm_opp_set_config+0x58/0x610
sp : ffff8000093c3710
x29: ffff8000093c3710 x28: ffffbca3959d82b8 x27: ffff8000093c3d00
x26: ffffbca3959d8e08 x25: ffff4396cac98118 x24: ffff4396c0e24810
x23: ffff4396c4272c40 x22: ffff4396c0e24810 x21: ffff8000093c3810
x20: ffff4396cac36800 x19: ffff4396cac96800 x18: 0000000000000000
x17: 0000000000000003 x16: ffffbca3f4edf198 x15: 0000001cba64a858
x14: 0000000000000180 x13: 000000000000017e x12: 0000000000000000
x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff8000093c35c0
x8 : ffff4396c4273700 x7 : ffff43983efca6c0 x6 : ffff43983efca640
x5 : 00000000410fd0d0 x4 : ffff4396c4272c40 x3 : ffffbca3f5d1e008
x2 : 0000000000000000 x1 : ffff4396c2421600 x0 : ffff4396cac96860
Call trace:
dev_pm_opp_set_config+0x49c/0x610
devm_pm_opp_set_config+0x18/0x70
vcodec_domains_get+0xb8/0x1638 [venus_core]
core_get_v4+0x1d8/0x218 [venus_core]
venus_probe+0xf4/0x468 [venus_core]
platform_probe+0x68/0xd8
really_probe+0xbc/0x2a8
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0xf0
__driver_attach+0x70/0x120
bus_for_each_dev+0x70/0xc0
driver_attach+0x24/0x30
bus_add_driver+0x150/0x200
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
qcom_venus_driver_init+0x24/0x1000 [venus_core]
do_one_initcall+0x54/0x1c8
do_init_module+0x44/0x1d0
load_module+0x16c8/0x1aa0
__do_sys_finit_module+0xbc/0x110
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xcc/0xf0
do_el0_svc+0x2c/0xb8
el0_svc+0x2c/0x88
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x18c/0x190
qcom-venus: probe of aa00000.video-codec failed with error -16

The fix is re-ordering the code related to OPP core. The OPP core
expects all configuration options to be provided before the OPP
table is added.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
/linux-master/net/sched/
H A Dact_ife.cdiff c2a67de9 Fri Dec 29 06:26:41 MST 2023 Pedro Tammela <pctammela@mojatatu.com> net/sched: introduce ACT_P_BOUND return code

Bound actions always return '0' and as of today we rely on '0'
being returned in order to properly skip bound actions in
tcf_idr_insert_many. In order to further improve maintainability,
introduce the ACT_P_BOUND return code.

Actions are updated to return 'ACT_P_BOUND' instead of plain '0'.
tcf_idr_insert_many is then updated to check for 'ACT_P_BOUND'.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231229132642.1489088-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff c2a67de9 Fri Dec 29 06:26:41 MST 2023 Pedro Tammela <pctammela@mojatatu.com> net/sched: introduce ACT_P_BOUND return code

Bound actions always return '0' and as of today we rely on '0'
being returned in order to properly skip bound actions in
tcf_idr_insert_many. In order to further improve maintainability,
introduce the ACT_P_BOUND return code.

Actions are updated to return 'ACT_P_BOUND' instead of plain '0'.
tcf_idr_insert_many is then updated to check for 'ACT_P_BOUND'.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231229132642.1489088-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff c2a67de9 Fri Dec 29 06:26:41 MST 2023 Pedro Tammela <pctammela@mojatatu.com> net/sched: introduce ACT_P_BOUND return code

Bound actions always return '0' and as of today we rely on '0'
being returned in order to properly skip bound actions in
tcf_idr_insert_many. In order to further improve maintainability,
introduce the ACT_P_BOUND return code.

Actions are updated to return 'ACT_P_BOUND' instead of plain '0'.
tcf_idr_insert_many is then updated to check for 'ACT_P_BOUND'.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231229132642.1489088-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 44c23d71 Wed Jan 15 09:20:39 MST 2020 Eric Dumazet <edumazet@google.com> net/sched: act_ife: initalize ife->metalist earlier

It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
/linux-master/drivers/gpu/drm/meson/
H A Dmeson_encoder_hdmi.cdiff b334be86 Tue Jan 23 12:37:18 MST 2024 Jani Nikula <jani.nikula@intel.com> drm/meson: switch to drm_bridge_edid_read()

Prefer using the struct drm_edid based functions.

Not ideal, should use source physical address from connector info.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/0a6556b4abaa341b5a3b9b466dbb23714369f7e1.1706038510.git.jani.nikula@intel.com
diff 099f0af9 Thu Sep 14 07:10:15 MDT 2023 Jani Nikula <jani.nikula@intel.com> drm/meson: fix memory leak on ->hpd_notify callback

The EDID returned by drm_bridge_get_edid() needs to be freed.

Fixes: 0af5e0b41110 ("drm/meson: encoder_hdmi: switch to bridge DRM_BRIDGE_ATTACH_NO_CONNECTOR")
Cc: Neil Armstrong <narmstrong@baylibre.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Cc: Neil Armstrong <neil.armstrong@linaro.org>
Cc: Kevin Hilman <khilman@baylibre.com>
Cc: Jerome Brunet <jbrunet@baylibre.com>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-amlogic@lists.infradead.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: stable@vger.kernel.org # v5.17+
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230914131015.2472029-1-jani.nikula@intel.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
diff 09847723 Tue Sep 20 16:28:42 MDT 2022 Adrián Larumbe <adrian.larumbe@collabora.com> drm/meson: remove drm bridges at aggregate driver unbind time

drm bridges added by meson_encoder_hdmi_init and meson_encoder_cvbs_init
were not manually removed at module unload time, which caused dangling
references to freed memory to remain linked in the global bridge_list.

When loading the driver modules back in, the same functions would again
call drm_bridge_add, and when traversing the global bridge_list, would
end up peeking into freed memory.

Once again KASAN revealed the problem:

[ +0.000095] =============================================================
[ +0.000008] BUG: KASAN: use-after-free in __list_add_valid+0x9c/0x120
[ +0.000018] Read of size 8 at addr ffff00003da291f0 by task modprobe/2483

[ +0.000018] CPU: 3 PID: 2483 Comm: modprobe Tainted: G C O 5.19.0-rc6-lrmbkasan+ #1
[ +0.000011] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ +0.000008] Call trace:
[ +0.000006] dump_backtrace+0x1ec/0x280
[ +0.000012] show_stack+0x24/0x80
[ +0.000008] dump_stack_lvl+0x98/0xd4
[ +0.000011] print_address_description.constprop.0+0x80/0x520
[ +0.000011] print_report+0x128/0x260
[ +0.000008] kasan_report+0xb8/0xfc
[ +0.000008] __asan_report_load8_noabort+0x3c/0x50
[ +0.000009] __list_add_valid+0x9c/0x120
[ +0.000009] drm_bridge_add+0x6c/0x104 [drm]
[ +0.000165] dw_hdmi_probe+0x1900/0x2360 [dw_hdmi]
[ +0.000022] meson_dw_hdmi_bind+0x520/0x814 [meson_dw_hdmi]
[ +0.000014] component_bind+0x174/0x520
[ +0.000012] component_bind_all+0x1a8/0x38c
[ +0.000010] meson_drv_bind_master+0x5e8/0xb74 [meson_drm]
[ +0.000032] meson_drv_bind+0x20/0x2c [meson_drm]
[ +0.000027] try_to_bring_up_aggregate_device+0x19c/0x390
[ +0.000010] component_master_add_with_match+0x1c8/0x284
[ +0.000009] meson_drv_probe+0x274/0x280 [meson_drm]
[ +0.000026] platform_probe+0xd0/0x220
[ +0.000009] really_probe+0x3ac/0xa80
[ +0.000009] __driver_probe_device+0x1f8/0x400
[ +0.000009] driver_probe_device+0x68/0x1b0
[ +0.000009] __driver_attach+0x20c/0x480
[ +0.000008] bus_for_each_dev+0x114/0x1b0
[ +0.000009] driver_attach+0x48/0x64
[ +0.000008] bus_add_driver+0x390/0x564
[ +0.000009] driver_register+0x1a8/0x3e4
[ +0.000009] __platform_driver_register+0x6c/0x94
[ +0.000008] meson_drm_platform_driver_init+0x3c/0x1000 [meson_drm]
[ +0.000027] do_one_initcall+0xc4/0x2b0
[ +0.000011] do_init_module+0x154/0x570
[ +0.000011] load_module+0x1a78/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000009] __arm64_sys_init_module+0x78/0xb0
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000012] el0t_64_sync_handler+0x11c/0x150
[ +0.000008] el0t_64_sync+0x18c/0x190

[ +0.000016] Allocated by task 879:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000011] __kasan_kmalloc+0x90/0xd0
[ +0.000007] __kmalloc+0x278/0x4a0
[ +0.000011] mpi_resize+0x13c/0x1d0
[ +0.000011] mpi_powm+0xd24/0x1570
[ +0.000009] rsa_enc+0x1a4/0x30c
[ +0.000009] pkcs1pad_verify+0x3f0/0x580
[ +0.000009] public_key_verify_signature+0x7a8/0xba4
[ +0.000010] public_key_verify_signature_2+0x40/0x60
[ +0.000008] verify_signature+0xb4/0x114
[ +0.000008] pkcs7_validate_trust_one.constprop.0+0x3b8/0x574
[ +0.000009] pkcs7_validate_trust+0xb8/0x15c
[ +0.000008] verify_pkcs7_message_sig+0xec/0x1b0
[ +0.000012] verify_pkcs7_signature+0x78/0xac
[ +0.000007] mod_verify_sig+0x110/0x190
[ +0.000009] module_sig_check+0x114/0x1e0
[ +0.000009] load_module+0xa0/0x1ea4
[ +0.000008] __do_sys_init_module+0x184/0x1cc
[ +0.000008] __arm64_sys_init_module+0x78/0xb0
[ +0.000008] invoke_syscall+0x74/0x260
[ +0.000009] el0_svc_common.constprop.0+0x1a8/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000009] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] Freed by task 2422:
[ +0.000008] kasan_save_stack+0x2c/0x5c
[ +0.000009] kasan_set_track+0x2c/0x40
[ +0.000007] kasan_set_free_info+0x28/0x50
[ +0.000009] ____kasan_slab_free+0x128/0x1d4
[ +0.000008] __kasan_slab_free+0x18/0x24
[ +0.000007] slab_free_freelist_hook+0x108/0x230
[ +0.000010] kfree+0x110/0x35c
[ +0.000008] release_nodes+0xf0/0x16c
[ +0.000009] devres_release_group+0x180/0x270
[ +0.000008] take_down_aggregate_device+0xcc/0x160
[ +0.000010] component_del+0x18c/0x360
[ +0.000009] meson_dw_hdmi_remove+0x28/0x40 [meson_dw_hdmi]
[ +0.000013] platform_remove+0x64/0xb0
[ +0.000008] device_remove+0xb8/0x154
[ +0.000009] device_release_driver_internal+0x398/0x5b0
[ +0.000009] driver_detach+0xac/0x1b0
[ +0.000009] bus_remove_driver+0x158/0x29c
[ +0.000008] driver_unregister+0x70/0xb0
[ +0.000009] platform_driver_unregister+0x20/0x2c
[ +0.000007] meson_dw_hdmi_platform_driver_exit+0x1c/0x30 [meson_dw_hdmi]
[ +0.000012] __do_sys_delete_module+0x288/0x400
[ +0.000009] __arm64_sys_delete_module+0x5c/0x80
[ +0.000009] invoke_syscall+0x74/0x260
[ +0.000008] el0_svc_common.constprop.0+0xcc/0x260
[ +0.000008] do_el0_svc+0x50/0x70
[ +0.000007] el0_svc+0x68/0x1a0
[ +0.000008] el0t_64_sync_handler+0x11c/0x150
[ +0.000009] el0t_64_sync+0x18c/0x190

[ +0.000013] The buggy address belongs to the object at ffff00003da29000
which belongs to the cache kmalloc-1k of size 1024
[ +0.000008] The buggy address is located 496 bytes inside of
1024-byte region [ffff00003da29000, ffff00003da29400)

[ +0.000015] The buggy address belongs to the physical page:
[ +0.000009] page:fffffc0000f68a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3da28
[ +0.000012] head:fffffc0000f68a00 order:3 compound_mapcount:0 compound_pincount:0
[ +0.000009] flags: 0xffff00000010200(slab|head|node=0|zone=0|lastcpupid=0xffff)
[ +0.000019] raw: 0ffff00000010200 fffffc0000eb5c08 fffffc0000d96608 ffff000000002a80
[ +0.000008] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ +0.000008] page dumped because: kasan: bad access detected

[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff00003da29080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ffff00003da29100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] >ffff00003da29180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ^
[ +0.000008] ffff00003da29200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000006] ffff00003da29280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================

Fix by keeping track of which encoders were initialised in the meson_drm
structure and manually removing their bridges at aggregate driver's unbind
time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920222842.1053234-1-adrian.larumbe@collabora.com
/linux-master/tools/testing/selftests/bpf/progs/
H A Dconnect4_dropper.c43d2b88c Mon Sep 13 17:07:59 MDT 2021 Daniel Borkmann <daniel@iogearbox.net> bpf, selftests: Add test case for mixed cgroup v1/v2

Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

# ./test_progs -t cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
run_test:PASS:join_classid 0 nsec
(network_helpers.c:219: errno: None) Unexpected success to connect to server
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
#27 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

# ./test_progs -t cgroup_v1v2
#27 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-3-daniel@iogearbox.net
43d2b88c Mon Sep 13 17:07:59 MDT 2021 Daniel Borkmann <daniel@iogearbox.net> bpf, selftests: Add test case for mixed cgroup v1/v2

Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

# ./test_progs -t cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
run_test:PASS:join_classid 0 nsec
(network_helpers.c:219: errno: None) Unexpected success to connect to server
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
#27 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

# ./test_progs -t cgroup_v1v2
#27 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-3-daniel@iogearbox.net
43d2b88c Mon Sep 13 17:07:59 MDT 2021 Daniel Borkmann <daniel@iogearbox.net> bpf, selftests: Add test case for mixed cgroup v1/v2

Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

# ./test_progs -t cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
run_test:PASS:join_classid 0 nsec
(network_helpers.c:219: errno: None) Unexpected success to connect to server
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
#27 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

# ./test_progs -t cgroup_v1v2
#27 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-3-daniel@iogearbox.net
43d2b88c Mon Sep 13 17:07:59 MDT 2021 Daniel Borkmann <daniel@iogearbox.net> bpf, selftests: Add test case for mixed cgroup v1/v2

Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

# ./test_progs -t cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
run_test:PASS:join_classid 0 nsec
(network_helpers.c:219: errno: None) Unexpected success to connect to server
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
#27 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

# ./test_progs -t cgroup_v1v2
#27 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-3-daniel@iogearbox.net
43d2b88c Mon Sep 13 17:07:59 MDT 2021 Daniel Borkmann <daniel@iogearbox.net> bpf, selftests: Add test case for mixed cgroup v1/v2

Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

# ./test_progs -t cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
run_test:PASS:join_classid 0 nsec
(network_helpers.c:219: errno: None) Unexpected success to connect to server
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
#27 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

# ./test_progs -t cgroup_v1v2
#27 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-3-daniel@iogearbox.net
43d2b88c Mon Sep 13 17:07:59 MDT 2021 Daniel Borkmann <daniel@iogearbox.net> bpf, selftests: Add test case for mixed cgroup v1/v2

Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

# ./test_progs -t cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
run_test:PASS:join_classid 0 nsec
(network_helpers.c:219: errno: None) Unexpected success to connect to server
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
#27 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

# ./test_progs -t cgroup_v1v2
#27 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-3-daniel@iogearbox.net
43d2b88c Mon Sep 13 17:07:59 MDT 2021 Daniel Borkmann <daniel@iogearbox.net> bpf, selftests: Add test case for mixed cgroup v1/v2

Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

# ./test_progs -t cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
run_test:PASS:join_classid 0 nsec
(network_helpers.c:219: errno: None) Unexpected success to connect to server
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
#27 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

# ./test_progs -t cgroup_v1v2
#27 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-3-daniel@iogearbox.net
43d2b88c Mon Sep 13 17:07:59 MDT 2021 Daniel Borkmann <daniel@iogearbox.net> bpf, selftests: Add test case for mixed cgroup v1/v2

Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

# ./test_progs -t cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
run_test:PASS:join_classid 0 nsec
(network_helpers.c:219: errno: None) Unexpected success to connect to server
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
#27 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

# ./test_progs -t cgroup_v1v2
#27 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-3-daniel@iogearbox.net
43d2b88c Mon Sep 13 17:07:59 MDT 2021 Daniel Borkmann <daniel@iogearbox.net> bpf, selftests: Add test case for mixed cgroup v1/v2

Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

# ./test_progs -t cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
run_test:PASS:join_classid 0 nsec
(network_helpers.c:219: errno: None) Unexpected success to connect to server
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
#27 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

# ./test_progs -t cgroup_v1v2
#27 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-3-daniel@iogearbox.net
43d2b88c Mon Sep 13 17:07:59 MDT 2021 Daniel Borkmann <daniel@iogearbox.net> bpf, selftests: Add test case for mixed cgroup v1/v2

Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

# ./test_progs -t cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
run_test:PASS:join_classid 0 nsec
(network_helpers.c:219: errno: None) Unexpected success to connect to server
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
#27 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

# ./test_progs -t cgroup_v1v2
#27 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-3-daniel@iogearbox.net
43d2b88c Mon Sep 13 17:07:59 MDT 2021 Daniel Borkmann <daniel@iogearbox.net> bpf, selftests: Add test case for mixed cgroup v1/v2

Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

# ./test_progs -t cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
run_test:PASS:join_classid 0 nsec
(network_helpers.c:219: errno: None) Unexpected success to connect to server
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
#27 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

# ./test_progs -t cgroup_v1v2
#27 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-3-daniel@iogearbox.net
43d2b88c Mon Sep 13 17:07:59 MDT 2021 Daniel Borkmann <daniel@iogearbox.net> bpf, selftests: Add test case for mixed cgroup v1/v2

Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

# ./test_progs -t cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
run_test:PASS:join_classid 0 nsec
(network_helpers.c:219: errno: None) Unexpected success to connect to server
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
#27 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

# ./test_progs -t cgroup_v1v2
#27 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-3-daniel@iogearbox.net
43d2b88c Mon Sep 13 17:07:59 MDT 2021 Daniel Borkmann <daniel@iogearbox.net> bpf, selftests: Add test case for mixed cgroup v1/v2

Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

# ./test_progs -t cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
run_test:PASS:join_classid 0 nsec
(network_helpers.c:219: errno: None) Unexpected success to connect to server
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
#27 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

# ./test_progs -t cgroup_v1v2
#27 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-3-daniel@iogearbox.net
43d2b88c Mon Sep 13 17:07:59 MDT 2021 Daniel Borkmann <daniel@iogearbox.net> bpf, selftests: Add test case for mixed cgroup v1/v2

Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

# ./test_progs -t cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
run_test:PASS:join_classid 0 nsec
(network_helpers.c:219: errno: None) Unexpected success to connect to server
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
#27 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

# ./test_progs -t cgroup_v1v2
#27 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-3-daniel@iogearbox.net
43d2b88c Mon Sep 13 17:07:59 MDT 2021 Daniel Borkmann <daniel@iogearbox.net> bpf, selftests: Add test case for mixed cgroup v1/v2

Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

# ./test_progs -t cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
run_test:PASS:join_classid 0 nsec
(network_helpers.c:219: errno: None) Unexpected success to connect to server
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
#27 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

# ./test_progs -t cgroup_v1v2
#27 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-3-daniel@iogearbox.net
43d2b88c Mon Sep 13 17:07:59 MDT 2021 Daniel Borkmann <daniel@iogearbox.net> bpf, selftests: Add test case for mixed cgroup v1/v2

Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

# ./test_progs -t cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
run_test:PASS:join_classid 0 nsec
(network_helpers.c:219: errno: None) Unexpected success to connect to server
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
#27 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

# ./test_progs -t cgroup_v1v2
#27 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-3-daniel@iogearbox.net
43d2b88c Mon Sep 13 17:07:59 MDT 2021 Daniel Borkmann <daniel@iogearbox.net> bpf, selftests: Add test case for mixed cgroup v1/v2

Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

# ./test_progs -t cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
run_test:PASS:join_classid 0 nsec
(network_helpers.c:219: errno: None) Unexpected success to connect to server
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
#27 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

# ./test_progs -t cgroup_v1v2
#27 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-3-daniel@iogearbox.net
H A Dlsm_cgroup.cdiff 0db63c0b Fri Apr 26 18:25:44 MDT 2024 Alexei Starovoitov <ast@kernel.org> bpf: Fix verifier assumptions about socket->sk

The verifier assumes that 'sk' field in 'struct socket' is valid
and non-NULL when 'socket' pointer itself is trusted and non-NULL.
That may not be the case when socket was just created and
passed to LSM socket_accept hook.
Fix this verifier assumption and adjust tests.

Reported-by: Liam Wisehart <liamwisehart@meta.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Fixes: 6fcd486b3a0a ("bpf: Refactor RCU enforcement in the verifier.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20240427002544.68803-1-alexei.starovoitov@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
diff c453e64c Mon Nov 14 20:29:40 MST 2022 Wang Yufen <wangyufen@huawei.com> selftests/bpf: fix memory leak of lsm_cgroup

kmemleak reports this issue:

unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak

The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed

Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
/linux-master/arch/loongarch/include/asm/vdso/
H A Dgettimeofday.hdiff aa5e65dc Thu Jun 29 06:58:43 MDT 2023 Tiezhu Yang <yangtiezhu@loongson.cn> LoongArch: Add support to clone a time namespace

We can see that "Time namespaces are not supported" on LoongArch:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
# Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
1..0 # SKIP Time namespaces are not supported

On LoongArch the current kernel does not support CONFIG_TIME_NS which
depends on GENERIC_VDSO_TIME_NS, select GENERIC_VDSO_TIME_NS to enable
CONFIG_TIME_NS to build kernel/time/namespace.c.

Additionally, it needs to define some arch-dependent functions for the
timens, such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
vdso_join_timens().

At the same time, modify the layout of vvar to use one page size for
generic vdso data, expand another page size for timens vdso data and
assign LOONGARCH_VDSO_DATA_SIZE (maybe exceeds a page size if expand in
the future) for loongarch vdso data, at last add the callback function
vvar_fault() and modify stack_top().

With this patch under CONFIG_TIME_NS:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
ok 18 [739] Result (0) matches expectation (0)
# Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
diff aa5e65dc Thu Jun 29 06:58:43 MDT 2023 Tiezhu Yang <yangtiezhu@loongson.cn> LoongArch: Add support to clone a time namespace

We can see that "Time namespaces are not supported" on LoongArch:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
# Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
1..0 # SKIP Time namespaces are not supported

On LoongArch the current kernel does not support CONFIG_TIME_NS which
depends on GENERIC_VDSO_TIME_NS, select GENERIC_VDSO_TIME_NS to enable
CONFIG_TIME_NS to build kernel/time/namespace.c.

Additionally, it needs to define some arch-dependent functions for the
timens, such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
vdso_join_timens().

At the same time, modify the layout of vvar to use one page size for
generic vdso data, expand another page size for timens vdso data and
assign LOONGARCH_VDSO_DATA_SIZE (maybe exceeds a page size if expand in
the future) for loongarch vdso data, at last add the callback function
vvar_fault() and modify stack_top().

With this patch under CONFIG_TIME_NS:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
ok 18 [739] Result (0) matches expectation (0)
# Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
diff aa5e65dc Thu Jun 29 06:58:43 MDT 2023 Tiezhu Yang <yangtiezhu@loongson.cn> LoongArch: Add support to clone a time namespace

We can see that "Time namespaces are not supported" on LoongArch:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
# Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
1..0 # SKIP Time namespaces are not supported

On LoongArch the current kernel does not support CONFIG_TIME_NS which
depends on GENERIC_VDSO_TIME_NS, select GENERIC_VDSO_TIME_NS to enable
CONFIG_TIME_NS to build kernel/time/namespace.c.

Additionally, it needs to define some arch-dependent functions for the
timens, such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
vdso_join_timens().

At the same time, modify the layout of vvar to use one page size for
generic vdso data, expand another page size for timens vdso data and
assign LOONGARCH_VDSO_DATA_SIZE (maybe exceeds a page size if expand in
the future) for loongarch vdso data, at last add the callback function
vvar_fault() and modify stack_top().

With this patch under CONFIG_TIME_NS:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
ok 18 [739] Result (0) matches expectation (0)
# Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
diff aa5e65dc Thu Jun 29 06:58:43 MDT 2023 Tiezhu Yang <yangtiezhu@loongson.cn> LoongArch: Add support to clone a time namespace

We can see that "Time namespaces are not supported" on LoongArch:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
# Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
1..0 # SKIP Time namespaces are not supported

On LoongArch the current kernel does not support CONFIG_TIME_NS which
depends on GENERIC_VDSO_TIME_NS, select GENERIC_VDSO_TIME_NS to enable
CONFIG_TIME_NS to build kernel/time/namespace.c.

Additionally, it needs to define some arch-dependent functions for the
timens, such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
vdso_join_timens().

At the same time, modify the layout of vvar to use one page size for
generic vdso data, expand another page size for timens vdso data and
assign LOONGARCH_VDSO_DATA_SIZE (maybe exceeds a page size if expand in
the future) for loongarch vdso data, at last add the callback function
vvar_fault() and modify stack_top().

With this patch under CONFIG_TIME_NS:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
ok 18 [739] Result (0) matches expectation (0)
# Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
diff aa5e65dc Thu Jun 29 06:58:43 MDT 2023 Tiezhu Yang <yangtiezhu@loongson.cn> LoongArch: Add support to clone a time namespace

We can see that "Time namespaces are not supported" on LoongArch:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
# Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
1..0 # SKIP Time namespaces are not supported

On LoongArch the current kernel does not support CONFIG_TIME_NS which
depends on GENERIC_VDSO_TIME_NS, select GENERIC_VDSO_TIME_NS to enable
CONFIG_TIME_NS to build kernel/time/namespace.c.

Additionally, it needs to define some arch-dependent functions for the
timens, such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
vdso_join_timens().

At the same time, modify the layout of vvar to use one page size for
generic vdso data, expand another page size for timens vdso data and
assign LOONGARCH_VDSO_DATA_SIZE (maybe exceeds a page size if expand in
the future) for loongarch vdso data, at last add the callback function
vvar_fault() and modify stack_top().

With this patch under CONFIG_TIME_NS:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
ok 18 [739] Result (0) matches expectation (0)
# Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
diff aa5e65dc Thu Jun 29 06:58:43 MDT 2023 Tiezhu Yang <yangtiezhu@loongson.cn> LoongArch: Add support to clone a time namespace

We can see that "Time namespaces are not supported" on LoongArch:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
# Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
1..0 # SKIP Time namespaces are not supported

On LoongArch the current kernel does not support CONFIG_TIME_NS which
depends on GENERIC_VDSO_TIME_NS, select GENERIC_VDSO_TIME_NS to enable
CONFIG_TIME_NS to build kernel/time/namespace.c.

Additionally, it needs to define some arch-dependent functions for the
timens, such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
vdso_join_timens().

At the same time, modify the layout of vvar to use one page size for
generic vdso data, expand another page size for timens vdso data and
assign LOONGARCH_VDSO_DATA_SIZE (maybe exceeds a page size if expand in
the future) for loongarch vdso data, at last add the callback function
vvar_fault() and modify stack_top().

With this patch under CONFIG_TIME_NS:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
ok 18 [739] Result (0) matches expectation (0)
# Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
diff aa5e65dc Thu Jun 29 06:58:43 MDT 2023 Tiezhu Yang <yangtiezhu@loongson.cn> LoongArch: Add support to clone a time namespace

We can see that "Time namespaces are not supported" on LoongArch:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
# Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
1..0 # SKIP Time namespaces are not supported

On LoongArch the current kernel does not support CONFIG_TIME_NS which
depends on GENERIC_VDSO_TIME_NS, select GENERIC_VDSO_TIME_NS to enable
CONFIG_TIME_NS to build kernel/time/namespace.c.

Additionally, it needs to define some arch-dependent functions for the
timens, such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
vdso_join_timens().

At the same time, modify the layout of vvar to use one page size for
generic vdso data, expand another page size for timens vdso data and
assign LOONGARCH_VDSO_DATA_SIZE (maybe exceeds a page size if expand in
the future) for loongarch vdso data, at last add the callback function
vvar_fault() and modify stack_top().

With this patch under CONFIG_TIME_NS:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
ok 18 [739] Result (0) matches expectation (0)
# Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
diff aa5e65dc Thu Jun 29 06:58:43 MDT 2023 Tiezhu Yang <yangtiezhu@loongson.cn> LoongArch: Add support to clone a time namespace

We can see that "Time namespaces are not supported" on LoongArch:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
# Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
1..0 # SKIP Time namespaces are not supported

On LoongArch the current kernel does not support CONFIG_TIME_NS which
depends on GENERIC_VDSO_TIME_NS, select GENERIC_VDSO_TIME_NS to enable
CONFIG_TIME_NS to build kernel/time/namespace.c.

Additionally, it needs to define some arch-dependent functions for the
timens, such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
vdso_join_timens().

At the same time, modify the layout of vvar to use one page size for
generic vdso data, expand another page size for timens vdso data and
assign LOONGARCH_VDSO_DATA_SIZE (maybe exceeds a page size if expand in
the future) for loongarch vdso data, at last add the callback function
vvar_fault() and modify stack_top().

With this patch under CONFIG_TIME_NS:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
ok 18 [739] Result (0) matches expectation (0)
# Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
diff aa5e65dc Thu Jun 29 06:58:43 MDT 2023 Tiezhu Yang <yangtiezhu@loongson.cn> LoongArch: Add support to clone a time namespace

We can see that "Time namespaces are not supported" on LoongArch:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
# Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
1..0 # SKIP Time namespaces are not supported

On LoongArch the current kernel does not support CONFIG_TIME_NS which
depends on GENERIC_VDSO_TIME_NS, select GENERIC_VDSO_TIME_NS to enable
CONFIG_TIME_NS to build kernel/time/namespace.c.

Additionally, it needs to define some arch-dependent functions for the
timens, such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
vdso_join_timens().

At the same time, modify the layout of vvar to use one page size for
generic vdso data, expand another page size for timens vdso data and
assign LOONGARCH_VDSO_DATA_SIZE (maybe exceeds a page size if expand in
the future) for loongarch vdso data, at last add the callback function
vvar_fault() and modify stack_top().

With this patch under CONFIG_TIME_NS:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
ok 18 [739] Result (0) matches expectation (0)
# Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
diff aa5e65dc Thu Jun 29 06:58:43 MDT 2023 Tiezhu Yang <yangtiezhu@loongson.cn> LoongArch: Add support to clone a time namespace

We can see that "Time namespaces are not supported" on LoongArch:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
# Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
1..0 # SKIP Time namespaces are not supported

On LoongArch the current kernel does not support CONFIG_TIME_NS which
depends on GENERIC_VDSO_TIME_NS, select GENERIC_VDSO_TIME_NS to enable
CONFIG_TIME_NS to build kernel/time/namespace.c.

Additionally, it needs to define some arch-dependent functions for the
timens, such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
vdso_join_timens().

At the same time, modify the layout of vvar to use one page size for
generic vdso data, expand another page size for timens vdso data and
assign LOONGARCH_VDSO_DATA_SIZE (maybe exceeds a page size if expand in
the future) for loongarch vdso data, at last add the callback function
vvar_fault() and modify stack_top().

With this patch under CONFIG_TIME_NS:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
ok 18 [739] Result (0) matches expectation (0)
# Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
diff aa5e65dc Thu Jun 29 06:58:43 MDT 2023 Tiezhu Yang <yangtiezhu@loongson.cn> LoongArch: Add support to clone a time namespace

We can see that "Time namespaces are not supported" on LoongArch:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
# Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
1..0 # SKIP Time namespaces are not supported

On LoongArch the current kernel does not support CONFIG_TIME_NS which
depends on GENERIC_VDSO_TIME_NS, select GENERIC_VDSO_TIME_NS to enable
CONFIG_TIME_NS to build kernel/time/namespace.c.

Additionally, it needs to define some arch-dependent functions for the
timens, such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
vdso_join_timens().

At the same time, modify the layout of vvar to use one page size for
generic vdso data, expand another page size for timens vdso data and
assign LOONGARCH_VDSO_DATA_SIZE (maybe exceeds a page size if expand in
the future) for loongarch vdso data, at last add the callback function
vvar_fault() and modify stack_top().

With this patch under CONFIG_TIME_NS:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
ok 18 [739] Result (0) matches expectation (0)
# Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
diff aa5e65dc Thu Jun 29 06:58:43 MDT 2023 Tiezhu Yang <yangtiezhu@loongson.cn> LoongArch: Add support to clone a time namespace

We can see that "Time namespaces are not supported" on LoongArch:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
# Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
1..0 # SKIP Time namespaces are not supported

On LoongArch the current kernel does not support CONFIG_TIME_NS which
depends on GENERIC_VDSO_TIME_NS, select GENERIC_VDSO_TIME_NS to enable
CONFIG_TIME_NS to build kernel/time/namespace.c.

Additionally, it needs to define some arch-dependent functions for the
timens, such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
vdso_join_timens().

At the same time, modify the layout of vvar to use one page size for
generic vdso data, expand another page size for timens vdso data and
assign LOONGARCH_VDSO_DATA_SIZE (maybe exceeds a page size if expand in
the future) for loongarch vdso data, at last add the callback function
vvar_fault() and modify stack_top().

With this patch under CONFIG_TIME_NS:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
ok 18 [739] Result (0) matches expectation (0)
# Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
diff aa5e65dc Thu Jun 29 06:58:43 MDT 2023 Tiezhu Yang <yangtiezhu@loongson.cn> LoongArch: Add support to clone a time namespace

We can see that "Time namespaces are not supported" on LoongArch:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
# Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
1..0 # SKIP Time namespaces are not supported

On LoongArch the current kernel does not support CONFIG_TIME_NS which
depends on GENERIC_VDSO_TIME_NS, select GENERIC_VDSO_TIME_NS to enable
CONFIG_TIME_NS to build kernel/time/namespace.c.

Additionally, it needs to define some arch-dependent functions for the
timens, such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
vdso_join_timens().

At the same time, modify the layout of vvar to use one page size for
generic vdso data, expand another page size for timens vdso data and
assign LOONGARCH_VDSO_DATA_SIZE (maybe exceeds a page size if expand in
the future) for loongarch vdso data, at last add the callback function
vvar_fault() and modify stack_top().

With this patch under CONFIG_TIME_NS:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
ok 18 [739] Result (0) matches expectation (0)
# Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
diff aa5e65dc Thu Jun 29 06:58:43 MDT 2023 Tiezhu Yang <yangtiezhu@loongson.cn> LoongArch: Add support to clone a time namespace

We can see that "Time namespaces are not supported" on LoongArch:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
# Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
1..0 # SKIP Time namespaces are not supported

On LoongArch the current kernel does not support CONFIG_TIME_NS which
depends on GENERIC_VDSO_TIME_NS, select GENERIC_VDSO_TIME_NS to enable
CONFIG_TIME_NS to build kernel/time/namespace.c.

Additionally, it needs to define some arch-dependent functions for the
timens, such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
vdso_join_timens().

At the same time, modify the layout of vvar to use one page size for
generic vdso data, expand another page size for timens vdso data and
assign LOONGARCH_VDSO_DATA_SIZE (maybe exceeds a page size if expand in
the future) for loongarch vdso data, at last add the callback function
vvar_fault() and modify stack_top().

With this patch under CONFIG_TIME_NS:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
ok 18 [739] Result (0) matches expectation (0)
# Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
diff aa5e65dc Thu Jun 29 06:58:43 MDT 2023 Tiezhu Yang <yangtiezhu@loongson.cn> LoongArch: Add support to clone a time namespace

We can see that "Time namespaces are not supported" on LoongArch:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
# Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
1..0 # SKIP Time namespaces are not supported

On LoongArch the current kernel does not support CONFIG_TIME_NS which
depends on GENERIC_VDSO_TIME_NS, select GENERIC_VDSO_TIME_NS to enable
CONFIG_TIME_NS to build kernel/time/namespace.c.

Additionally, it needs to define some arch-dependent functions for the
timens, such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
vdso_join_timens().

At the same time, modify the layout of vvar to use one page size for
generic vdso data, expand another page size for timens vdso data and
assign LOONGARCH_VDSO_DATA_SIZE (maybe exceeds a page size if expand in
the future) for loongarch vdso data, at last add the callback function
vvar_fault() and modify stack_top().

With this patch under CONFIG_TIME_NS:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
ok 18 [739] Result (0) matches expectation (0)
# Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
diff aa5e65dc Thu Jun 29 06:58:43 MDT 2023 Tiezhu Yang <yangtiezhu@loongson.cn> LoongArch: Add support to clone a time namespace

We can see that "Time namespaces are not supported" on LoongArch:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
# Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
1..0 # SKIP Time namespaces are not supported

On LoongArch the current kernel does not support CONFIG_TIME_NS which
depends on GENERIC_VDSO_TIME_NS, select GENERIC_VDSO_TIME_NS to enable
CONFIG_TIME_NS to build kernel/time/namespace.c.

Additionally, it needs to define some arch-dependent functions for the
timens, such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
vdso_join_timens().

At the same time, modify the layout of vvar to use one page size for
generic vdso data, expand another page size for timens vdso data and
assign LOONGARCH_VDSO_DATA_SIZE (maybe exceeds a page size if expand in
the future) for loongarch vdso data, at last add the callback function
vvar_fault() and modify stack_top().

With this patch under CONFIG_TIME_NS:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
ok 18 [739] Result (0) matches expectation (0)
# Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
diff aa5e65dc Thu Jun 29 06:58:43 MDT 2023 Tiezhu Yang <yangtiezhu@loongson.cn> LoongArch: Add support to clone a time namespace

We can see that "Time namespaces are not supported" on LoongArch:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
# Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
1..0 # SKIP Time namespaces are not supported

On LoongArch the current kernel does not support CONFIG_TIME_NS which
depends on GENERIC_VDSO_TIME_NS, select GENERIC_VDSO_TIME_NS to enable
CONFIG_TIME_NS to build kernel/time/namespace.c.

Additionally, it needs to define some arch-dependent functions for the
timens, such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
vdso_join_timens().

At the same time, modify the layout of vvar to use one page size for
generic vdso data, expand another page size for timens vdso data and
assign LOONGARCH_VDSO_DATA_SIZE (maybe exceeds a page size if expand in
the future) for loongarch vdso data, at last add the callback function
vvar_fault() and modify stack_top().

With this patch under CONFIG_TIME_NS:

(1) clone3 test
# cd tools/testing/selftests/clone3 && make && ./clone3
...
ok 18 [739] Result (0) matches expectation (0)
# Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0

(2) timens test
# cd tools/testing/selftests/timens && make && ./timens
...
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
/linux-master/drivers/gpu/drm/nouveau/
H A Dnouveau_connector.hdiff 64d17f25 Thu Oct 24 02:52:53 MDT 2019 Hans de Goede <hdegoede@redhat.com> drm/nouveau: Fix drm-core using atomic code-paths on pre-nv50 hardware

We do not support atomic modesetting on pre-nv50 hardware, but until now
our connector code was setting drm_connector->state on pre-nv50 hardware.

This causes the core to enter atomic modesetting paths in at least:

1. drm_connector_get_encoder(), returning connector->state->best_encoder
which is always 0, causing us to always report 0 as encoder_id in
the drmModeConnector struct returned by drmModeGetConnector().

2. drm_encoder_get_crtc(), returning NULL because uses_atomic get set,
causing us to always report 0 as crtc_id in the drmModeEncoder struct
returned by drmModeGetEncoder()

Which in turn confuses userspace, at least plymouth thinks that the pipe
has changed because of this and tries to reconfigure it unnecessarily.

More in general we should not set drm_connector->state in the non-atomic
code as this violates the drm-core's expectations.

This commit fixes this by using a nouveau_conn_atom struct embedded in the
nouveau_connector struct for property handling in the non-atomic case.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1706557
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 64d17f25 Thu Oct 24 02:52:53 MDT 2019 Hans de Goede <hdegoede@redhat.com> drm/nouveau: Fix drm-core using atomic code-paths on pre-nv50 hardware

We do not support atomic modesetting on pre-nv50 hardware, but until now
our connector code was setting drm_connector->state on pre-nv50 hardware.

This causes the core to enter atomic modesetting paths in at least:

1. drm_connector_get_encoder(), returning connector->state->best_encoder
which is always 0, causing us to always report 0 as encoder_id in
the drmModeConnector struct returned by drmModeGetConnector().

2. drm_encoder_get_crtc(), returning NULL because uses_atomic get set,
causing us to always report 0 as crtc_id in the drmModeEncoder struct
returned by drmModeGetEncoder()

Which in turn confuses userspace, at least plymouth thinks that the pipe
has changed because of this and tries to reconfigure it unnecessarily.

More in general we should not set drm_connector->state in the non-atomic
code as this violates the drm-core's expectations.

This commit fixes this by using a nouveau_conn_atom struct embedded in the
nouveau_connector struct for property handling in the non-atomic case.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1706557
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 64d17f25 Thu Oct 24 02:52:53 MDT 2019 Hans de Goede <hdegoede@redhat.com> drm/nouveau: Fix drm-core using atomic code-paths on pre-nv50 hardware

We do not support atomic modesetting on pre-nv50 hardware, but until now
our connector code was setting drm_connector->state on pre-nv50 hardware.

This causes the core to enter atomic modesetting paths in at least:

1. drm_connector_get_encoder(), returning connector->state->best_encoder
which is always 0, causing us to always report 0 as encoder_id in
the drmModeConnector struct returned by drmModeGetConnector().

2. drm_encoder_get_crtc(), returning NULL because uses_atomic get set,
causing us to always report 0 as crtc_id in the drmModeEncoder struct
returned by drmModeGetEncoder()

Which in turn confuses userspace, at least plymouth thinks that the pipe
has changed because of this and tries to reconfigure it unnecessarily.

More in general we should not set drm_connector->state in the non-atomic
code as this violates the drm-core's expectations.

This commit fixes this by using a nouveau_conn_atom struct embedded in the
nouveau_connector struct for property handling in the non-atomic case.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1706557
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff 6d757753 Thu Sep 06 15:43:23 MDT 2018 Lyude Paul <lyude@redhat.com> drm/nouveau: Move backlight device into nouveau_connector

Currently module unloading is broken in nouveau due to a rather annoying
race condition resulting from nouveau_backlight.c having gone a bit
stale over time:

[ 1960.791143] ==================================================================
[ 1960.791394] BUG: KASAN: use-after-free in nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791460] Read of size 4 at addr ffff88075accf350 by task zsh/11185
[ 1960.791521]
[ 1960.791545] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G O 4.18.0Lyude-Test+ #4
[ 1960.791580] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.791628] Call Trace:
[ 1960.791680] dump_stack+0xa4/0xfd
[ 1960.791721] print_address_description+0x71/0x239
[ 1960.791833] ? nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.791877] kasan_report.cold.6+0x242/0x2fe
[ 1960.791919] __asan_report_load4_noabort+0x19/0x20
[ 1960.792012] nouveau_backlight_exit+0x112/0x150 [nouveau]
[ 1960.792081] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.792150] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.792265] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.792347] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.792378] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.792406] ? trace_hardirqs_on+0xd/0x10
[ 1960.792472] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.792502] pci_device_remove+0x112/0x2d0
[ 1960.792530] ? pcibios_free_irq+0x10/0x10
[ 1960.792558] ? kasan_check_write+0x14/0x20
[ 1960.792587] device_release_driver_internal+0x35c/0x650
[ 1960.792617] device_release_driver+0x12/0x20
[ 1960.792643] pci_stop_bus_device+0x172/0x1e0
[ 1960.792671] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.792715] remove_store+0xcb/0xe0
[ 1960.792753] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.792779] ? __lock_is_held+0xb5/0x140
[ 1960.792808] ? component_add+0x530/0x530
[ 1960.792834] dev_attr_store+0x3f/0x70
[ 1960.792859] ? sysfs_file_ops+0x11d/0x170
[ 1960.792885] sysfs_kf_write+0x104/0x150
[ 1960.792915] ? sysfs_file_ops+0x170/0x170
[ 1960.792940] kernfs_fop_write+0x24f/0x400
[ 1960.792978] ? __lock_acquire+0x6ea/0x47f0
[ 1960.793021] __vfs_write+0xeb/0x760
[ 1960.793048] ? kernel_read+0x130/0x130
[ 1960.793076] ? __lock_is_held+0xb5/0x140
[ 1960.793107] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.793135] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.793162] ? __sb_start_write+0x183/0x220
[ 1960.793189] vfs_write+0x14d/0x4a0
[ 1960.793229] ksys_write+0xd2/0x1b0
[ 1960.793255] ? __ia32_sys_read+0xb0/0xb0
[ 1960.793298] ? fput+0x1d/0x120
[ 1960.793324] ? filp_close+0xf3/0x130
[ 1960.793349] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.793380] __x64_sys_write+0x73/0xb0
[ 1960.793407] do_syscall_64+0xaa/0x400
[ 1960.793433] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.793460] RIP: 0033:0x7f59df433164
[ 1960.793486] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.793541] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.793576] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.793620] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.793665] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.793696] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.793730] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.793768]
[ 1960.793790] Allocated by task 11167:
[ 1960.793816] save_stack+0x43/0xd0
[ 1960.793841] kasan_kmalloc+0xc4/0xe0
[ 1960.793880] kasan_slab_alloc+0x11/0x20
[ 1960.793905] kmem_cache_alloc+0xd7/0x270
[ 1960.793944] getname_flags+0xbd/0x520
[ 1960.793969] user_path_at_empty+0x23/0x50
[ 1960.793994] do_faccessat+0x1fc/0x5d0
[ 1960.794018] __x64_sys_access+0x59/0x80
[ 1960.794043] do_syscall_64+0xaa/0x400
[ 1960.794067] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794093]
[ 1960.794127] Freed by task 11167:
[ 1960.794152] save_stack+0x43/0xd0
[ 1960.794190] __kasan_slab_free+0x139/0x190
[ 1960.794215] kasan_slab_free+0xe/0x10
[ 1960.794239] kmem_cache_free+0xcb/0x2c0
[ 1960.794264] putname+0xad/0xe0
[ 1960.794287] filename_lookup.part.59+0x1f1/0x360
[ 1960.794313] user_path_at_empty+0x3e/0x50
[ 1960.794338] do_faccessat+0x1fc/0x5d0
[ 1960.794362] __x64_sys_access+0x59/0x80
[ 1960.794393] do_syscall_64+0xaa/0x400
[ 1960.794421] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.794461]
[ 1960.794483] The buggy address belongs to the object at ffff88075acceac0
[ 1960.794483] which belongs to the cache names_cache of size 4096
[ 1960.794540] The buggy address is located 2192 bytes inside of
[ 1960.794540] 4096-byte region [ffff88075acceac0, ffff88075accfac0)
[ 1960.794581] The buggy address belongs to the page:
[ 1960.794609] page:ffffea001d6b3200 count:1 mapcount:0 mapping:ffff880778e4b1c0 index:0x0 compound_mapcount: 0
[ 1960.794651] flags: 0x8000000000008100(slab|head)
[ 1960.794679] raw: 8000000000008100 ffffea001d39e808 ffffea001d39ea08 ffff880778e4b1c0
[ 1960.794739] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1960.794785] page dumped because: kasan: bad access detected
[ 1960.794813]
[ 1960.794834] Memory state around the buggy address:
[ 1960.794861] ffff88075accf200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794894] ffff88075accf280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794925] >ffff88075accf300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.794956] ^
[ 1960.794985] ffff88075accf380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795017] ffff88075accf400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1960.795061] ==================================================================
[ 1960.795106] Disabling lock debugging due to kernel taint
[ 1960.795131] ------------[ cut here ]------------
[ 1960.795148] ida_remove called for id=1802201963 which is not allocated.
[ 1960.795193] WARNING: CPU: 7 PID: 11185 at lib/idr.c:521 ida_remove+0x184/0x210
[ 1960.795213] Modules linked in: nouveau(O) mxm_wmi ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev vfat fat intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul iTCO_wdt psmouse wmi_bmof mei_me tpm_tis mei tpm_tis_core tpm i2c_i801 thinkpad_acpi pcc_cpufreq crc32c_intel serio_raw xhci_pci xhci_hcd wmi video i2c_dev i2c_core
[ 1960.795305] CPU: 7 PID: 11185 Comm: zsh Kdump: loaded Tainted: G B O 4.18.0Lyude-Test+ #4
[ 1960.795330] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET79W (1.52 ) 07/13/2018
[ 1960.795352] RIP: 0010:ida_remove+0x184/0x210
[ 1960.795370] Code: 4c 89 f7 e8 ae c8 00 00 eb 22 41 83 c4 02 4c 89 e8 41 83 fc 3f 0f 86 64 ff ff ff 44 89 fe 48 c7 c7 20 94 1e 83 e8 54 ed 81 fe <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 01 c3 c7 03 00 00 00 00 c7
[ 1960.795402] RSP: 0018:ffff88074d4df7b8 EFLAGS: 00010082
[ 1960.795421] RAX: 0000000000000000 RBX: 1ffff100e9a9befa RCX: ffffffff81479975
[ 1960.795440] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88077c1de690
[ 1960.795460] RBP: ffff88074d4df878 R08: ffffed00ef83bcd3 R09: ffffed00ef83bcd2
[ 1960.795479] R10: ffffed00ef83bcd2 R11: ffff88077c1de697 R12: 000000000000036b
[ 1960.795498] R13: 0000000000000202 R14: ffffffffa0aa7fa0 R15: 000000006b6b6b6b
[ 1960.795518] FS: 00007f59e0995b80(0000) GS:ffff88077c1c0000(0000) knlGS:0000000000000000
[ 1960.795553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1960.795571] CR2: 00007f59e09a2010 CR3: 00000004a1a70005 CR4: 00000000003606e0
[ 1960.795596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1960.795629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1960.795649] Call Trace:
[ 1960.795667] ? ida_destroy+0x1d0/0x1d0
[ 1960.795686] ? kasan_check_write+0x14/0x20
[ 1960.795704] ? do_raw_spin_lock+0xc2/0x1c0
[ 1960.795724] ida_simple_remove+0x26/0x40
[ 1960.795794] nouveau_backlight_exit+0x9d/0x150 [nouveau]
[ 1960.795867] nouveau_display_destroy+0x76/0x150 [nouveau]
[ 1960.795930] nouveau_drm_device_fini+0xb7/0x190 [nouveau]
[ 1960.795989] nouveau_drm_device_remove+0x14b/0x1d0 [nouveau]
[ 1960.796047] ? nouveau_cli_work_queue+0x2e0/0x2e0 [nouveau]
[ 1960.796067] ? trace_hardirqs_on_caller+0x38b/0x570
[ 1960.796089] ? trace_hardirqs_on+0xd/0x10
[ 1960.796146] nouveau_drm_remove+0x37/0x50 [nouveau]
[ 1960.796167] pci_device_remove+0x112/0x2d0
[ 1960.796186] ? pcibios_free_irq+0x10/0x10
[ 1960.796218] ? kasan_check_write+0x14/0x20
[ 1960.796237] device_release_driver_internal+0x35c/0x650
[ 1960.796257] device_release_driver+0x12/0x20
[ 1960.796289] pci_stop_bus_device+0x172/0x1e0
[ 1960.796308] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 1960.796328] remove_store+0xcb/0xe0
[ 1960.796345] ? sriov_numvfs_store+0x2e0/0x2e0
[ 1960.796364] ? __lock_is_held+0xb5/0x140
[ 1960.796383] ? component_add+0x530/0x530
[ 1960.796401] dev_attr_store+0x3f/0x70
[ 1960.796419] ? sysfs_file_ops+0x11d/0x170
[ 1960.796436] sysfs_kf_write+0x104/0x150
[ 1960.796454] ? sysfs_file_ops+0x170/0x170
[ 1960.796471] kernfs_fop_write+0x24f/0x400
[ 1960.796488] ? __lock_acquire+0x6ea/0x47f0
[ 1960.796520] __vfs_write+0xeb/0x760
[ 1960.796538] ? kernel_read+0x130/0x130
[ 1960.796556] ? __lock_is_held+0xb5/0x140
[ 1960.796590] ? rcu_read_lock_sched_held+0xdd/0x110
[ 1960.796608] ? rcu_sync_lockdep_assert+0x78/0xb0
[ 1960.796626] ? __sb_start_write+0x183/0x220
[ 1960.796648] vfs_write+0x14d/0x4a0
[ 1960.796666] ksys_write+0xd2/0x1b0
[ 1960.796684] ? __ia32_sys_read+0xb0/0xb0
[ 1960.796701] ? fput+0x1d/0x120
[ 1960.796732] ? filp_close+0xf3/0x130
[ 1960.796749] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[ 1960.796768] __x64_sys_write+0x73/0xb0
[ 1960.796800] do_syscall_64+0xaa/0x400
[ 1960.796818] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1960.796836] RIP: 0033:0x7f59df433164
[ 1960.796854] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 81 38 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1960.796884] RSP: 002b:00007ffd70ee2fb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1960.796906] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f59df433164
[ 1960.796926] RDX: 0000000000000002 RSI: 00005578088640c0 RDI: 0000000000000001
[ 1960.796946] RBP: 00005578088640c0 R08: 00007f59df7038c0 R09: 00007f59e0995b80
[ 1960.796966] R10: 000000000000000a R11: 0000000000000246 R12: 00007f59df702760
[ 1960.796985] R13: 0000000000000002 R14: 00007f59df6fd760 R15: 0000000000000002
[ 1960.797008] irq event stamp: 509990
[ 1960.797026] hardirqs last enabled at (509989): [<ffffffff8119ff78>] flush_work+0x4b8/0x6d0
[ 1960.797063] hardirqs last disabled at (509990): [<ffffffff8297c395>] _raw_spin_lock_irqsave+0x25/0x60
[ 1960.797085] softirqs last enabled at (509744): [<ffffffff82c005ad>] __do_softirq+0x5ad/0x8c0
[ 1960.797121] softirqs last disabled at (509735): [<ffffffff8115aa15>] irq_exit+0x1a5/0x1e0
[ 1960.797142] ---[ end trace fb1342325f1846b8 ]---

While I haven't actually gone into the details of what's causing this to
happen (maybe the kernel removes the backlight device in the device core
before we get to it?), it doesn't really matter anyway because the way
nouveau handles backlights has long since been deprecated.

According to the documentation on the drm_connector->late_register()
hook, the ->late_register() hook should be used for adding extra
connector-related devices. Vice versa, the ->early_unregister() hook is
meant to be used for removing those devices.

So: gut nouveau_drm->bl_list and nouveau_drm->backlight, and replace
them with per-connector backlight structures. Additionally, move
backlight registration/teardown into the ->late_register() and
->early_unregister() hooks so that DRM can give us a chance to remove
the backlight before the connector is even removed. This appears to fix
the problem once and for all.

Changes since v2:
- Use NV_INFO_ONCE for printing GMUX information, since otherwise this
will end up printing that message for as many times as we have
connectors

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
/linux-master/drivers/soc/aspeed/
H A Daspeed-socinfo.cdiff 8812dff6 Tue Aug 17 19:05:34 MDT 2021 Joel Stanley <joel@jms.id.au> soc: aspeed: socinfo: Add AST2625 variant

Add AST26XX series AST2625-A3 SOC ID, taken from the vendor u-boot SDK:

arch/arm/mach-aspeed/ast2600/scu_info.c
+ SOC_ID("AST2625-A3", 0x0503040305030403),

Reviewed-by: Dylan Hung <dylan_hung@aspeedtech.com>
Link: https://lore.kernel.org/r/20210818010534.2508122-1-joel@jms.id.au
Signed-off-by: Joel Stanley <joel@jms.id.au>
diff d0e72be7 Wed Feb 10 04:46:51 MST 2021 Joel Stanley <joel@jms.id.au> soc: aspeed: socinfo: Add new systems

Aspeed's u-boot sdk has been updated with the SoC IDs for the AST2605
variant, as well as A2 and A3 variants of the 2600 family.

>From u-boot's arch/arm/mach-aspeed/ast2600/scu_info.c:

SOC_ID("AST2600-A0", 0x0500030305000303),
SOC_ID("AST2600-A1", 0x0501030305010303),
SOC_ID("AST2620-A1", 0x0501020305010203),
SOC_ID("AST2600-A2", 0x0502030305010303),
SOC_ID("AST2620-A2", 0x0502020305010203),
SOC_ID("AST2605-A2", 0x0502010305010103),
SOC_ID("AST2600-A3", 0x0503030305030303),
SOC_ID("AST2620-A3", 0x0503020305030203),
SOC_ID("AST2605-A3", 0x0503010305030103),

Fixes: e0218dca5787 ("soc: aspeed: Add soc info driver")
Link: https://lore.kernel.org/r/20210210114651.334324-1-joel@jms.id.au
Signed-off-by: Joel Stanley <joel@jms.id.au>
diff d0e72be7 Wed Feb 10 04:46:51 MST 2021 Joel Stanley <joel@jms.id.au> soc: aspeed: socinfo: Add new systems

Aspeed's u-boot sdk has been updated with the SoC IDs for the AST2605
variant, as well as A2 and A3 variants of the 2600 family.

>From u-boot's arch/arm/mach-aspeed/ast2600/scu_info.c:

SOC_ID("AST2600-A0", 0x0500030305000303),
SOC_ID("AST2600-A1", 0x0501030305010303),
SOC_ID("AST2620-A1", 0x0501020305010203),
SOC_ID("AST2600-A2", 0x0502030305010303),
SOC_ID("AST2620-A2", 0x0502020305010203),
SOC_ID("AST2605-A2", 0x0502010305010103),
SOC_ID("AST2600-A3", 0x0503030305030303),
SOC_ID("AST2620-A3", 0x0503020305030203),
SOC_ID("AST2605-A3", 0x0503010305030103),

Fixes: e0218dca5787 ("soc: aspeed: Add soc info driver")
Link: https://lore.kernel.org/r/20210210114651.334324-1-joel@jms.id.au
Signed-off-by: Joel Stanley <joel@jms.id.au>
diff d0e72be7 Wed Feb 10 04:46:51 MST 2021 Joel Stanley <joel@jms.id.au> soc: aspeed: socinfo: Add new systems

Aspeed's u-boot sdk has been updated with the SoC IDs for the AST2605
variant, as well as A2 and A3 variants of the 2600 family.

>From u-boot's arch/arm/mach-aspeed/ast2600/scu_info.c:

SOC_ID("AST2600-A0", 0x0500030305000303),
SOC_ID("AST2600-A1", 0x0501030305010303),
SOC_ID("AST2620-A1", 0x0501020305010203),
SOC_ID("AST2600-A2", 0x0502030305010303),
SOC_ID("AST2620-A2", 0x0502020305010203),
SOC_ID("AST2605-A2", 0x0502010305010103),
SOC_ID("AST2600-A3", 0x0503030305030303),
SOC_ID("AST2620-A3", 0x0503020305030203),
SOC_ID("AST2605-A3", 0x0503010305030103),

Fixes: e0218dca5787 ("soc: aspeed: Add soc info driver")
Link: https://lore.kernel.org/r/20210210114651.334324-1-joel@jms.id.au
Signed-off-by: Joel Stanley <joel@jms.id.au>
diff d0e72be7 Wed Feb 10 04:46:51 MST 2021 Joel Stanley <joel@jms.id.au> soc: aspeed: socinfo: Add new systems

Aspeed's u-boot sdk has been updated with the SoC IDs for the AST2605
variant, as well as A2 and A3 variants of the 2600 family.

>From u-boot's arch/arm/mach-aspeed/ast2600/scu_info.c:

SOC_ID("AST2600-A0", 0x0500030305000303),
SOC_ID("AST2600-A1", 0x0501030305010303),
SOC_ID("AST2620-A1", 0x0501020305010203),
SOC_ID("AST2600-A2", 0x0502030305010303),
SOC_ID("AST2620-A2", 0x0502020305010203),
SOC_ID("AST2605-A2", 0x0502010305010103),
SOC_ID("AST2600-A3", 0x0503030305030303),
SOC_ID("AST2620-A3", 0x0503020305030203),
SOC_ID("AST2605-A3", 0x0503010305030103),

Fixes: e0218dca5787 ("soc: aspeed: Add soc info driver")
Link: https://lore.kernel.org/r/20210210114651.334324-1-joel@jms.id.au
Signed-off-by: Joel Stanley <joel@jms.id.au>
diff d0e72be7 Wed Feb 10 04:46:51 MST 2021 Joel Stanley <joel@jms.id.au> soc: aspeed: socinfo: Add new systems

Aspeed's u-boot sdk has been updated with the SoC IDs for the AST2605
variant, as well as A2 and A3 variants of the 2600 family.

>From u-boot's arch/arm/mach-aspeed/ast2600/scu_info.c:

SOC_ID("AST2600-A0", 0x0500030305000303),
SOC_ID("AST2600-A1", 0x0501030305010303),
SOC_ID("AST2620-A1", 0x0501020305010203),
SOC_ID("AST2600-A2", 0x0502030305010303),
SOC_ID("AST2620-A2", 0x0502020305010203),
SOC_ID("AST2605-A2", 0x0502010305010103),
SOC_ID("AST2600-A3", 0x0503030305030303),
SOC_ID("AST2620-A3", 0x0503020305030203),
SOC_ID("AST2605-A3", 0x0503010305030103),

Fixes: e0218dca5787 ("soc: aspeed: Add soc info driver")
Link: https://lore.kernel.org/r/20210210114651.334324-1-joel@jms.id.au
Signed-off-by: Joel Stanley <joel@jms.id.au>
diff d0e72be7 Wed Feb 10 04:46:51 MST 2021 Joel Stanley <joel@jms.id.au> soc: aspeed: socinfo: Add new systems

Aspeed's u-boot sdk has been updated with the SoC IDs for the AST2605
variant, as well as A2 and A3 variants of the 2600 family.

>From u-boot's arch/arm/mach-aspeed/ast2600/scu_info.c:

SOC_ID("AST2600-A0", 0x0500030305000303),
SOC_ID("AST2600-A1", 0x0501030305010303),
SOC_ID("AST2620-A1", 0x0501020305010203),
SOC_ID("AST2600-A2", 0x0502030305010303),
SOC_ID("AST2620-A2", 0x0502020305010203),
SOC_ID("AST2605-A2", 0x0502010305010103),
SOC_ID("AST2600-A3", 0x0503030305030303),
SOC_ID("AST2620-A3", 0x0503020305030203),
SOC_ID("AST2605-A3", 0x0503010305030103),

Fixes: e0218dca5787 ("soc: aspeed: Add soc info driver")
Link: https://lore.kernel.org/r/20210210114651.334324-1-joel@jms.id.au
Signed-off-by: Joel Stanley <joel@jms.id.au>
diff d0e72be7 Wed Feb 10 04:46:51 MST 2021 Joel Stanley <joel@jms.id.au> soc: aspeed: socinfo: Add new systems

Aspeed's u-boot sdk has been updated with the SoC IDs for the AST2605
variant, as well as A2 and A3 variants of the 2600 family.

>From u-boot's arch/arm/mach-aspeed/ast2600/scu_info.c:

SOC_ID("AST2600-A0", 0x0500030305000303),
SOC_ID("AST2600-A1", 0x0501030305010303),
SOC_ID("AST2620-A1", 0x0501020305010203),
SOC_ID("AST2600-A2", 0x0502030305010303),
SOC_ID("AST2620-A2", 0x0502020305010203),
SOC_ID("AST2605-A2", 0x0502010305010103),
SOC_ID("AST2600-A3", 0x0503030305030303),
SOC_ID("AST2620-A3", 0x0503020305030203),
SOC_ID("AST2605-A3", 0x0503010305030103),

Fixes: e0218dca5787 ("soc: aspeed: Add soc info driver")
Link: https://lore.kernel.org/r/20210210114651.334324-1-joel@jms.id.au
Signed-off-by: Joel Stanley <joel@jms.id.au>
diff d0e72be7 Wed Feb 10 04:46:51 MST 2021 Joel Stanley <joel@jms.id.au> soc: aspeed: socinfo: Add new systems

Aspeed's u-boot sdk has been updated with the SoC IDs for the AST2605
variant, as well as A2 and A3 variants of the 2600 family.

>From u-boot's arch/arm/mach-aspeed/ast2600/scu_info.c:

SOC_ID("AST2600-A0", 0x0500030305000303),
SOC_ID("AST2600-A1", 0x0501030305010303),
SOC_ID("AST2620-A1", 0x0501020305010203),
SOC_ID("AST2600-A2", 0x0502030305010303),
SOC_ID("AST2620-A2", 0x0502020305010203),
SOC_ID("AST2605-A2", 0x0502010305010103),
SOC_ID("AST2600-A3", 0x0503030305030303),
SOC_ID("AST2620-A3", 0x0503020305030203),
SOC_ID("AST2605-A3", 0x0503010305030103),

Fixes: e0218dca5787 ("soc: aspeed: Add soc info driver")
Link: https://lore.kernel.org/r/20210210114651.334324-1-joel@jms.id.au
Signed-off-by: Joel Stanley <joel@jms.id.au>
diff d0e72be7 Wed Feb 10 04:46:51 MST 2021 Joel Stanley <joel@jms.id.au> soc: aspeed: socinfo: Add new systems

Aspeed's u-boot sdk has been updated with the SoC IDs for the AST2605
variant, as well as A2 and A3 variants of the 2600 family.

>From u-boot's arch/arm/mach-aspeed/ast2600/scu_info.c:

SOC_ID("AST2600-A0", 0x0500030305000303),
SOC_ID("AST2600-A1", 0x0501030305010303),
SOC_ID("AST2620-A1", 0x0501020305010203),
SOC_ID("AST2600-A2", 0x0502030305010303),
SOC_ID("AST2620-A2", 0x0502020305010203),
SOC_ID("AST2605-A2", 0x0502010305010103),
SOC_ID("AST2600-A3", 0x0503030305030303),
SOC_ID("AST2620-A3", 0x0503020305030203),
SOC_ID("AST2605-A3", 0x0503010305030103),

Fixes: e0218dca5787 ("soc: aspeed: Add soc info driver")
Link: https://lore.kernel.org/r/20210210114651.334324-1-joel@jms.id.au
Signed-off-by: Joel Stanley <joel@jms.id.au>
/linux-master/drivers/media/platform/mediatek/vcodec/common/
H A Dmtk_vcodec_util.cdiff f19a771a Thu Dec 21 02:17:45 MST 2023 Fei Shao <fshao@chromium.org> media: mediatek: vcodec: Update mtk_vcodec_mem_free() error messages

In mtk_vcodec_mem_free(), there are two cases where a NULL VA is passed:
- mem->size == 0: we are called to free no memory. This may happen when
we call mtk_vcodec_mem_free() twice or the memory has never been
allocated.
- mem->size > 0: we are called to free memory but without VA. This means
that we failed to free the memory for real.

Both cases are not expected to happen, and we want to have clearer error
messages to describe which one we just encountered.
Update the error messages to include more information for that purpose.

Signed-off-by: Fei Shao <fshao@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
diff f19a771a Thu Dec 21 02:17:45 MST 2023 Fei Shao <fshao@chromium.org> media: mediatek: vcodec: Update mtk_vcodec_mem_free() error messages

In mtk_vcodec_mem_free(), there are two cases where a NULL VA is passed:
- mem->size == 0: we are called to free no memory. This may happen when
we call mtk_vcodec_mem_free() twice or the memory has never been
allocated.
- mem->size > 0: we are called to free memory but without VA. This means
that we failed to free the memory for real.

Both cases are not expected to happen, and we want to have clearer error
messages to describe which one we just encountered.
Update the error messages to include more information for that purpose.

Signed-off-by: Fei Shao <fshao@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
diff 56c0ac05 Tue Oct 10 06:20:10 MDT 2023 Yunfei Dong <yunfei.dong@mediatek.com> media: mediatek: vcodec: using encoder device to alloc/free encoder memory

Need to use encoder device to allocate/free encoder memory when calling
mtk_vcodec_mem_alloc/mtk_vcodec_mem_free, or leading to below crash log
when test encoder with decoder device.

pc : dma_alloc_attrs+0x44/0xf4
lr : mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common]
sp : ffffffc0209f3990
x29: ffffffc0209f39a0 x28: ffffff8024102a18 x27: 0000000000000000
x26: 0000000000000000 x25: ffffffc00c06e2d8 x24: 0000000000000001
x23: 0000000000000cc0 x22: 0000000000000010 x21: 0000000000000800
x20: ffffff8024102a18 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000009 x16: ffffffe389736a98 x15: 0000000000000078
x14: ffffffe389704434 x13: 0000000000000007 x12: ffffffe38a2b2560
x11: 0000000000000800 x10: 0000000000000004 x9 : ffffffe331f07484
x8 : 5400e9aef2395000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000cc0
x2 : ffffff8024102a18 x1 : 0000000000000800 x0 : 0000000000000010
Call trace:
dma_alloc_attrs+0x44/0xf4
mtk_vcodec_mem_alloc+0x50/0xa4 [mtk_vcodec_common 2819d3d601f3cd06c1f2213ac1b9995134441421]
h264_enc_set_param+0x27c/0x378 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
venc_if_set_param+0x4c/0x7c [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2ops_venc_start_streaming+0x1bc/0x328 [mtk_vcodec_enc 772cc3d26c254e8cf54079451ef8d930d2eb4404]
vb2_start_streaming+0x64/0x12c
vb2_core_streamon+0x114/0x158
vb2_streamon+0x38/0x60
v4l2_m2m_streamon+0x48/0x88
v4l2_m2m_ioctl_streamon+0x20/0x2c
v4l_streamon+0x2c/0x38
__video_do_ioctl+0x2c4/0x3dc
video_usercopy+0x404/0x934
video_ioctl2+0x20/0x2c
v4l2_ioctl+0x54/0x64
v4l2_compat_ioctl32+0x90/0xa34
__arm64_compat_sys_ioctl+0x128/0x13c
invoke_syscall+0x4c/0x108
el0_svc_common+0x98/0x104
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x2c/0x74
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x194/0x198
Code: aa0003f6 aa0203f4 aa0103f5 f900

'Fixes: 01abf5fbb081c ("media: mediatek: vcodec: separate struct 'mtk_vcodec_ctx'")'
Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
/linux-master/drivers/net/wireless/broadcom/brcm80211/brcmfmac/
H A Dcommon.cdiff 0f485805 Tue Feb 14 01:00:33 MST 2023 Hector Martin <marcan@marcan.st> wifi: brcmfmac: acpi: Add support for fetching Apple ACPI properties

On DT platforms, the module-instance and antenna-sku-info properties
are passed in the DT. On ACPI platforms, module-instance is passed via
the analogous Apple device property mechanism, while the antenna SKU
info is instead obtained via an ACPI method that grabs it from
non-volatile storage.

Add support for this, to allow proper firmware selection on Apple
platforms.

Signed-off-by: Hector Martin <marcan@marcan.st>
Reviewed-by: Julian Calaby <julian.calaby@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230214080034.3828-2-marcan@marcan.st
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
diff 660145d7 Fri Dec 30 00:51:39 MST 2022 Jisoo Jang <jisoo.jang@yonsei.ac.kr> wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds

Fix a stack-out-of-bounds read in brcmfmac that occurs
when 'buf' that is not null-terminated is passed as an argument of
strreplace() in brcmf_c_preinit_dcmds(). This buffer is filled with
a CLM version string by memcpy() in brcmf_fil_iovar_data_get().
Ensure buf is null-terminated.

Found by a modified version of syzkaller.

[ 33.004414][ T1896] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 33.013486][ T1896] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43236/3 wl0: Nov 30 2011 17:33:42 version 5.90.188.22
[ 33.021554][ T1896] ==================================================================
[ 33.022379][ T1896] BUG: KASAN: stack-out-of-bounds in strreplace+0xf2/0x110
[ 33.023122][ T1896] Read of size 1 at addr ffffc90001d6efc8 by task kworker/0:2/1896
[ 33.023852][ T1896]
[ 33.024096][ T1896] CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132
[ 33.024927][ T1896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 33.026065][ T1896] Workqueue: usb_hub_wq hub_event
[ 33.026581][ T1896] Call Trace:
[ 33.026896][ T1896] dump_stack_lvl+0x57/0x7d
[ 33.027372][ T1896] print_address_description.constprop.0.cold+0xf/0x334
[ 33.028037][ T1896] ? strreplace+0xf2/0x110
[ 33.028403][ T1896] ? strreplace+0xf2/0x110
[ 33.028807][ T1896] kasan_report.cold+0x83/0xdf
[ 33.029283][ T1896] ? strreplace+0xf2/0x110
[ 33.029666][ T1896] strreplace+0xf2/0x110
[ 33.029966][ T1896] brcmf_c_preinit_dcmds+0xab1/0xc40
[ 33.030351][ T1896] ? brcmf_c_set_joinpref_default+0x100/0x100
[ 33.030787][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.031223][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.031661][ T1896] ? lock_acquire+0x19d/0x4e0
[ 33.032091][ T1896] ? find_held_lock+0x2d/0x110
[ 33.032605][ T1896] ? brcmf_usb_deq+0x1a7/0x260
[ 33.033087][ T1896] ? brcmf_usb_rx_fill_all+0x5a/0xf0
[ 33.033582][ T1896] brcmf_attach+0x246/0xd40
[ 33.034022][ T1896] ? wiphy_new_nm+0x1476/0x1d50
[ 33.034383][ T1896] ? kmemdup+0x30/0x40
[ 33.034722][ T1896] brcmf_usb_probe+0x12de/0x1690
[ 33.035223][ T1896] ? brcmf_usbdev_qinit.constprop.0+0x470/0x470
[ 33.035833][ T1896] usb_probe_interface+0x25f/0x710
[ 33.036315][ T1896] really_probe+0x1be/0xa90
[ 33.036656][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.037026][ T1896] ? usb_match_id.part.0+0x88/0xc0
[ 33.037383][ T1896] driver_probe_device+0x49/0x120
[ 33.037790][ T1896] __device_attach_driver+0x18a/0x250
[ 33.038300][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.038986][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.039906][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.041412][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.041861][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.042330][ T1896] __device_attach+0x207/0x330
[ 33.042664][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.043026][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.043515][ T1896] bus_probe_device+0x1a2/0x260
[ 33.043914][ T1896] device_add+0xa61/0x1ce0
[ 33.044227][ T1896] ? __mutex_unlock_slowpath+0xe7/0x660
[ 33.044891][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.045531][ T1896] usb_set_configuration+0x984/0x1770
[ 33.046051][ T1896] ? kernfs_create_link+0x175/0x230
[ 33.046548][ T1896] usb_generic_driver_probe+0x69/0x90
[ 33.046931][ T1896] usb_probe_device+0x9c/0x220
[ 33.047434][ T1896] really_probe+0x1be/0xa90
[ 33.047760][ T1896] __driver_probe_device+0x2ab/0x460
[ 33.048134][ T1896] driver_probe_device+0x49/0x120
[ 33.048516][ T1896] __device_attach_driver+0x18a/0x250
[ 33.048910][ T1896] ? driver_allows_async_probing+0x120/0x120
[ 33.049437][ T1896] bus_for_each_drv+0x123/0x1a0
[ 33.049814][ T1896] ? bus_rescan_devices+0x20/0x20
[ 33.050164][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.050579][ T1896] ? trace_hardirqs_on+0x1c/0x120
[ 33.050936][ T1896] __device_attach+0x207/0x330
[ 33.051399][ T1896] ? device_bind_driver+0xb0/0xb0
[ 33.051888][ T1896] ? kobject_uevent_env+0x230/0x12c0
[ 33.052314][ T1896] bus_probe_device+0x1a2/0x260
[ 33.052688][ T1896] device_add+0xa61/0x1ce0
[ 33.053121][ T1896] ? __fw_devlink_link_to_suppliers+0x550/0x550
[ 33.053568][ T1896] usb_new_device.cold+0x463/0xf66
[ 33.053953][ T1896] ? hub_disconnect+0x400/0x400
[ 33.054313][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.054661][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.055094][ T1896] hub_event+0x10d5/0x3330
[ 33.055530][ T1896] ? hub_port_debounce+0x280/0x280
[ 33.055934][ T1896] ? __lock_acquire+0x1671/0x5790
[ 33.056387][ T1896] ? wq_calc_node_cpumask+0x170/0x2a0
[ 33.056924][ T1896] ? lock_release+0x640/0x640
[ 33.057383][ T1896] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 33.057916][ T1896] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 33.058402][ T1896] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 33.059019][ T1896] process_one_work+0x873/0x13e0
[ 33.059488][ T1896] ? lock_release+0x640/0x640
[ 33.059932][ T1896] ? pwq_dec_nr_in_flight+0x320/0x320
[ 33.060446][ T1896] ? rwlock_bug.part.0+0x90/0x90
[ 33.060898][ T1896] worker_thread+0x8b/0xd10
[ 33.061348][ T1896] ? __kthread_parkme+0xd9/0x1d0
[ 33.061810][ T1896] ? process_one_work+0x13e0/0x13e0
[ 33.062288][ T1896] kthread+0x379/0x450
[ 33.062660][ T1896] ? _raw_spin_unlock_irq+0x24/0x30
[ 33.063148][ T1896] ? set_kthread_struct+0x100/0x100
[ 33.063606][ T1896] ret_from_fork+0x1f/0x30
[ 33.064070][ T1896]
[ 33.064313][ T1896]
[ 33.064545][ T1896] addr ffffc90001d6efc8 is located in stack of task kworker/0:2/1896 at offset 512 in frame:
[ 33.065478][ T1896] brcmf_c_preinit_dcmds+0x0/0xc40
[ 33.065973][ T1896]
[ 33.066191][ T1896] this frame has 4 objects:
[ 33.066614][ T1896] [48, 56) 'ptr'
[ 33.066618][ T1896] [80, 148) 'revinfo'
[ 33.066957][ T1896] [192, 210) 'eventmask'
[ 33.067338][ T1896] [256, 512) 'buf'
[ 33.067742][ T1896]
[ 33.068304][ T1896] Memory state around the buggy address:
[ 33.068838][ T1896] ffffc90001d6ee80: f2 00 00 02 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 33.069545][ T1896] ffffc90001d6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.070626][ T1896] >ffffc90001d6ef80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 33.072052][ T1896] ^
[ 33.073043][ T1896] ffffc90001d6f000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074230][ T1896] ffffc90001d6f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.074914][ T1896] ==================================================================
[ 33.075713][ T1896] Disabling lock debugging due to kernel taint

Reviewed-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221230075139.56591-1-jisoo.jang@yonsei.ac.kr
/linux-master/drivers/gpu/drm/xe/
H A Dxe_ttm_sys_mgr.c1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1a545ed7 Mon Apr 03 16:20:31 MDT 2023 Chang, Bruce <yu.bruce.chang@intel.com> drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740] <TASK>
[ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617] drm_dev_release+0x30/0x50 [drm]
[ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
/linux-master/drivers/net/ethernet/hisilicon/hns/
H A Dhns_enet.hdiff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 27463ad9 Wed Jul 05 20:22:00 MDT 2017 Yunsheng Lin <linyunsheng@huawei.com> net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635] alloc_debug_processing+0x18c/0x1a0
[17659.117208] __slab_alloc+0x52c/0x560
[17659.120909] kmem_cache_alloc_node+0xac/0x2c0
[17659.125309] __alloc_skb+0x6c/0x260
[17659.128837] tcp_send_ack+0x8c/0x280
[17659.132449] __tcp_ack_snd_check+0x9c/0xf0
[17659.136587] tcp_rcv_established+0x5a4/0xa70
[17659.140899] tcp_v4_do_rcv+0x27c/0x620
[17659.144687] tcp_prequeue_process+0x108/0x170
[17659.149085] tcp_recvmsg+0x940/0x1020
[17659.152787] inet_recvmsg+0x124/0x180
[17659.156488] sock_recvmsg+0x64/0x80
[17659.160012] SyS_recvfrom+0xd8/0x180
[17659.163626] __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000] free_debug_processing+0x1d4/0x2c0
[17659.178486] __slab_free+0x240/0x390
[17659.182100] kmem_cache_free+0x24c/0x270
[17659.186062] kfree_skbmem+0xa0/0xb0
[17659.189587] __kfree_skb+0x28/0x40
[17659.193025] napi_gro_receive+0x168/0x1c0
[17659.197074] hns_nic_rx_up_pro+0x58/0x90
[17659.201038] hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352] hns_nic_common_poll+0x94/0x140
[17659.209576] net_rx_action+0x458/0x5e0
[17659.213363] __do_softirq+0x1b8/0x480
[17659.217062] run_ksoftirqd+0x64/0x80
[17659.220679] smpboot_thread_fn+0x224/0x310
[17659.224821] kthread+0x150/0x170
[17659.228084] ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490] __slab_alloc+0x52c/0x560
[17751.084188] kmem_cache_alloc+0x244/0x280
[17751.088238] __build_skb+0x40/0x150
[17751.091764] build_skb+0x28/0x100
[17751.095115] __alloc_rx_skb+0x94/0x150
[17751.098900] __napi_alloc_skb+0x34/0x90
[17751.102776] hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097] hns_nic_common_poll+0x94/0x140
[17751.111333] net_rx_action+0x458/0x5e0
[17751.115123] __do_softirq+0x1b8/0x480
[17751.118823] run_ksoftirqd+0x64/0x80
[17751.122437] smpboot_thread_fn+0x224/0x310
[17751.126575] kthread+0x150/0x170
[17751.129838] ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951] free_debug_processing+0x1d4/0x2c0
[17751.144436] __slab_free+0x240/0x390
[17751.148051] kmem_cache_free+0x24c/0x270
[17751.152014] kfree_skbmem+0xa0/0xb0
[17751.155543] __kfree_skb+0x28/0x40
[17751.159022] napi_gro_receive+0x168/0x1c0
[17751.163074] hns_nic_rx_up_pro+0x58/0x90
[17751.167041] hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358] hns_nic_common_poll+0x94/0x140
[17751.175585] net_rx_action+0x458/0x5e0
[17751.179373] __do_softirq+0x1b8/0x480
[17751.183076] run_ksoftirqd+0x64/0x80
[17751.186691] smpboot_thread_fn+0x224/0x310
[17751.190826] kthread+0x150/0x170
[17751.194093] ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
/linux-master/drivers/cpufreq/
H A Damd-pstate-ut.cdiff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
diff 60dd2838 Fri Aug 18 05:44:52 MDT 2023 Swapnil Sapkal <swapnil.sapkal@amd.com> cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver

After loading the amd-pstate-ut driver, amd_pstate_ut_check_perf()
and amd_pstate_ut_check_freq() use cpufreq_cpu_get() to get the policy
of the CPU and mark it as busy.

In these functions, cpufreq_cpu_put() should be used to release the
policy, but it is not, so any other entity trying to access the policy
is blocked indefinitely.

One such scenario is when amd_pstate mode is changed, leading to the
following splat:

[ 1332.103727] INFO: task bash:2929 blocked for more than 120 seconds.
[ 1332.110001] Not tainted 6.5.0-rc2-amd-pstate-ut #5
[ 1332.115315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1332.123140] task:bash state:D stack:0 pid:2929 ppid:2873 flags:0x00004006
[ 1332.123143] Call Trace:
[ 1332.123145] <TASK>
[ 1332.123148] __schedule+0x3c1/0x16a0
[ 1332.123154] ? _raw_read_lock_irqsave+0x2d/0x70
[ 1332.123157] schedule+0x6f/0x110
[ 1332.123160] schedule_timeout+0x14f/0x160
[ 1332.123162] ? preempt_count_add+0x86/0xd0
[ 1332.123165] __wait_for_common+0x92/0x190
[ 1332.123168] ? __pfx_schedule_timeout+0x10/0x10
[ 1332.123170] wait_for_completion+0x28/0x30
[ 1332.123173] cpufreq_policy_put_kobj+0x4d/0x90
[ 1332.123177] cpufreq_policy_free+0x157/0x1d0
[ 1332.123178] ? preempt_count_add+0x58/0xd0
[ 1332.123180] cpufreq_remove_dev+0xb6/0x100
[ 1332.123182] subsys_interface_unregister+0x114/0x120
[ 1332.123185] ? preempt_count_add+0x58/0xd0
[ 1332.123187] ? __pfx_amd_pstate_change_driver_mode+0x10/0x10
[ 1332.123190] cpufreq_unregister_driver+0x3b/0xd0
[ 1332.123192] amd_pstate_change_driver_mode+0x1e/0x50
[ 1332.123194] store_status+0xe9/0x180
[ 1332.123197] dev_attr_store+0x1b/0x30
[ 1332.123199] sysfs_kf_write+0x42/0x50
[ 1332.123202] kernfs_fop_write_iter+0x143/0x1d0
[ 1332.123204] vfs_write+0x2df/0x400
[ 1332.123208] ksys_write+0x6b/0xf0
[ 1332.123210] __x64_sys_write+0x1d/0x30
[ 1332.123213] do_syscall_64+0x60/0x90
[ 1332.123216] ? fpregs_assert_state_consistent+0x2e/0x50
[ 1332.123219] ? exit_to_user_mode_prepare+0x49/0x1a0
[ 1332.123223] ? irqentry_exit_to_user_mode+0xd/0x20
[ 1332.123225] ? irqentry_exit+0x3f/0x50
[ 1332.123226] ? exc_page_fault+0x8e/0x190
[ 1332.123228] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1332.123232] RIP: 0033:0x7fa74c514a37
[ 1332.123234] RSP: 002b:00007ffe31dd0788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1332.123238] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa74c514a37
[ 1332.123239] RDX: 0000000000000008 RSI: 000055e27c447aa0 RDI: 0000000000000001
[ 1332.123241] RBP: 000055e27c447aa0 R08: 00007fa74c5d1460 R09: 000000007fffffff
[ 1332.123242] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 1332.123244] R13: 00007fa74c61a780 R14: 00007fa74c616600 R15: 00007fa74c615a00
[ 1332.123247] </TASK>

Fix this by calling cpufreq_cpu_put() wherever necessary.

Fixes: 14eb1c96e3a3 ("cpufreq: amd-pstate: Add test module for amd-pstate driver")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Meng Li <li.meng@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Completed in 1918 milliseconds

1234567891011>>